MongoDB: Aggregation -Find latest record from collection

 

MongoDB, as we are aware is a NoSQL database. We can perform many aggregation operations on the MongoDB. In this post, I am going to share how we can use MongoDB to find the latest records from the collection.

We have multiple documents available in the MongoDB collection and need to just fetch the latest records. In SQL, we could use the Windows function like rank or row number to perform this operation but MongoDB we cannot use this approach.

What we going to use to find the latest record?

        Well answer to this question is aggregation methods. MongoDB support multiple aggregation methods like group by, sort, order by.  MongoDB also has aggregation pipeline operators like abs, avg, ceil, last, first, filter, etc. I have shared this in upcoming posts. In this post our objective is to find the latest records for each document from the collection, so we are going to use sort, group by and last(aggregation pipeline operator).

How to find the latest records

Let’s assume  we are having below data in the Mongo collection:

#Sample Data:

{ “_id” : 1, “eid” : “1”, “modifyDate” : ISODate(“2020-01-01T08:00:00Z”), “deptId” : 1, “country” : “IN” }

{ “_id” : 2, “eid” : “2”, “modifyDate” : ISODate(“2020-02-03T09:00:00Z”), “deptId” : 2, “country” : “US”}

{ “_id” : 3, “eid” : “3”, “modifyDate” : ISODate(“2020-03-03T09:05:00Z”), “deptId” : 1, “country” : “IN” }

{ “_id” : 4, “eid” : “1”, “modifyDate” : ISODate(“2020-02-15T08:00:00Z”), “deptId” : 1, “country” : “IN” }

{ “_id” : 5, “eid” : “3”, “modifyDate” : ISODate(“2020-02-15T09:05:00Z”), “deptId” : 3, “country” : “UK”}

{ “_id” : 6, “eid” : “2”, “modifyDate” : ISODate(“2020-02-15T12:05:10Z”), “deptId” : 2, “country” : “US”}

{ “_id” : 7, “eid” : “3”, “modifyDate” : ISODate(“2020-02-15T14:12:12Z”), “deptId” : 3, “country” : “UK”}

We have to find the latest records for each document, in our case we have done.

To sort the fields we need to set the order of sorting to 1 or -1. 1 is to sort the data in ascending order and -1 in descending order. In the group, you need to pass the columns which are required as output. If I define _id: {eid: “$eid”}, I will get eid column in the output along with the column on which condition is applied

$last is aggregated pipeline operator, which finds the latest record from the group of data.

db.collection.aggregate(

[

{ $sort: { eid: 1, modifyDate: 1} },

{

$group:

{

_id: {eid : “$eid” },

updatedDate: { $last: “$modifyDate” }

}

},

],

)

Output:

{ “_id” :  {“eid” : “1”, “modifyDate” : ISODate(“2020-02-15T08:00:00Z”) }}

{ “_id” :  {“eid” : “2”, “modifyDate” : ISODate(“2020-02-15T12:05:10Z”) }}

{ “_id” :  {“eid” : “3”, “modifyDate” : ISODate(“2020-02-15T14:12:12Z”) }}

If the data size will be huge, “memory issue” can be occurred. To avoid the memory issue need to use “allowDiskUse: true” property in the query.

db.collection.aggregate(

[

{ $sort: { eid: 1, modifyDate: 1} },

{

$group:

{

_id: {eid : “$eid”, country : “$country” },

updatedDate: { $last: “$modifyDate” }

}

}, ] }

], {allowDiskUse: true}

)

One disadvantage with the aggregation in MongoDB is aggregation commands cannot be used with the mongo export. If need to save the data to the files first need to create a temporary collection with the aggregated data and then use the mongo export command to export the data to the file

Sample command to create the temporary collection with the aggregated data. Use “$out” option to save the aggregated data to the collection.

db.collection.aggregate(

[

{ $sort: { eid: 1, modifyDate: 1} },

{

$group:

{

_id: {eid : “$eid”, country : “$country” },

updatedDate: { $last: “$modifyDate” }

}

}, { $out : “out_collection” }

], {allowDiskUse: true}

)

In the next posts, will learn the Aggregation Pipeline Operator followed by Aggregation Pipeline Stages in MongoDB

Kindly share your feedback and questions and subscribe Hadoop Tech and our facebook page(Hadoop Tech) for more articles. Follow on GitHub

 

 

 

 

 

 

Leave a comment