5 MongoDB aggregate methods one should know

Published on 25 June
by Showvhick Nath

MongoDB is a well-designed and powerful NO-SQL database. Day by day, it’s popularity is skyrocketing. It’s widely used with various backend languages.

MongoDB has covered all popular methods to retrieve data from the database in SQL style. In fact, in some cases, you can achieve exact same thing without writing huge queries. One of the most important thing to know in MongoDB is "aggregate" method. If you are coming from SQL background, it’s the method in mongo typically used as an alternative of "JOIN" in SQL.Here is this article, I will give example of 5 MongoDB aggregate methods which are useful to know.

1. Pipeline inside lookup

While declaring a pipeline, it’s an array of objects where each objects execute synchronizingly. Did you know that you can run a pipeline method inside a lookup too.

A lookup is alternative to "JOIN" in SQL. A normal lookup object inside a aggregate pipeline looks like this -

Db.collection(‘users’).aggregate([
{
$lookup:{
from:"Favorites",
localField:’_id’,
foreignField:’user_id’,
as:’favorites’
}
}
])

Cod-ex - 1

From the query, one should understand this query will produce a result where all users will be retrieved along with their favourites in a "favorites" key as an array.

Here, the query only matches the user id in both collection. Nothing else. What if you wanted to have more matching fields? Then you have to use pipeline method inside lookup. Let’s see how it should look-

db.collection('users').aggregate([
    {
        $lookup:{
            from:'Favorites',
            let:{
                '_id':'$_id'
            }
            pipeline:[
                {
                    $match:{
                        $expr:{
                            $and:[
                                {
                                    $eq:[
                                        '$user_id',
                                        '$$_id'
                                    ]
                                },
                                {
                                    $eq:[
                                        'status',
                                        'active'
                                    ]
                                }
                            ]
                        }
                    }
                }
            ],
            as:'favorites'
        }
    }
])

Code-ex - 2

See the example. Here inside the pipeline method, I have checked another key named "status". If status is active, only then it should come inside the "favorites" array. Not only this you can use any expressions like $in, $or to decorate your query.

2. Project inside lookup in Aggregate

You can add $project object inside the lookup query. A "$project" object lets you retrieve only specified fields from database object. Suppose, objects in ‘favorites’ collection has more than 10 keys for different purposes and you don’t need that in your aggregation. So you can retrieve only 2 or 3 key as per your requirement right inside the lookup query. Just adding the example of lookup here -

pipeline:[
{
    $match:{
        $expr:{
            $and:[
                {
                    $eq:[
                        '$user_id',
                        '$$_id'
                    ]
                },
                {
                    $eq:[
                        'status',
                        'active'
                    ]
                }
            ]
        }
    }
},
{
    $project:{
        name:1
        created_on:1,
    }
}
],

Code-ex -3

Here in $project object, I am retrieving only two keys. One is "name" and other is "created_on". Both these keys are projected from the lookup collection named "Favorites". This is the quickest way to retrieve only specific things you need from lookup query.

3. Lookup inside the lookup

You can defined a lookup query right inside another lookup object. Suppose you have built the application where user can login and can see list of movies. And they can mark few movies as "favourite". So you have three collections. One is "Users", One is "Movies" and other one is "Favorites". "Favorites" collection holds the relation between "Users" and "Movies". In "Favorites" collection, there is a key named "movie_id" which targets "_id" inside "Movies" collection. Now movie name is stored in "Movies" collection, so along with your "Favorites" lookup, you need to get name of the movie from "Movies" collection. You can do this in two way. The cleaner way is to have a lookup query inside the lookup query. See the example below. That’s how you do that -

pipeline:[
{
    $match:{
        $expr:{
            $and:[
                {
                    $eq:[
                        '$user_id',
                        '$$_id'
                    ]
                },
                {
                    $eq:[
                        'status',
                        'active'
                    ]
                }
            ]
        }
    }
},
{
    $lookup:{
        from:'Movies',
        localField:'movie_id',
        foreignField:'_id',
        as:'movie'
    }
},
{
    $project:{
        'movie.name':1
        'created_on':1,
    }
}
]

Code-ex - 4

Even if you want you can have an additional pipeline lookup inside the lookup query. That’s how powerful mongoDb is.

As I mentioned, you could retrieve the data in a different way. In that case you have to declare another lookup object inside the aggregation pipeline. Below examples shows you how to do that -

db.collection('users').aggregate([
    {
        $lookup:{
            from:'Favorites',
            let:{
                '_id':'$_id'
            }
            pipeline:[
                {
                    $match:{
                        $expr:{
                            $and:[
                                {
                                    $eq:[
                                        '$user_id',
                                        '$$_id'
                                    ]
                                },
                                {
                                    $eq:[
                                        'status',
                                        'active'
                                    ]
                                }
                            ]
                        }
                    }
                }
            ],
            as:'favorites'
        }
    },
    {
        $lookup:{
            from:'Movies',
            let:{
                movie_ids:'$favorites.movie_id'
            },
            pipeline:[
                {
                    $match:{
                        $expr:{
                            $in:[
                                '$_id',
                                '$$movie_ids'
                            ]
                        }
                    }
                }
            ],
            as:'FavoriteMovies'
        }
    },
])

Code-ex -6

See I have used $in key to make the query happen. So inside the ‘FavoriteMovies’, you will get list of your favorites movies along with data inside "movies" collection. But yes, first way of doing is lot more cleaner, but at the same time second way of doing will give you more organised response. Try it and let me know what you think is the best according to you.

4. Unwind in aggregation

Unwind is useful when you do something with the produced array from a lookup query. Without using unwind a typical response will be produced like this from query stated above -

[
    {
        _id:1234,

        created_on:1291291291291,
        favorites:[
            {
                movie_id:'movie_id_1234'
            },
            {
                movie_id:'movie_id_2345'
            }
        ],
        favoriteMovies:[
            {
                _id:'movie_id_1234',
                name:'The Shawshank Redemption'
            },
            {
                _id:'movie_id_2345',
                name:'The Dark Knight'
            }
        ]
    }
]

Code-ex - 7

Now if you add

{
  $unwind:’$favoriteMovies'
}

Code-ex - 8

Just after the lookup query. It will produce distinct objects with the favoriteMovies array. And your response will look like this -

[
    {
        _id:'movie_id_1234',
        userName:'Tom',
        created_on:1291291291291,
        name:'The Shawshank Redemption'
    },
    {
        _id:'movie_id_2345',
        userName:'Tom',
        created_on:1291291291291,
        name:'The Dark Knight'
    }
]

Code-ex - 9

See, here the response is lot cleaner but the user id which was previously stored in _id in previous response is replaced by the movie id since both has same ‘_id’ key. To avoid this, you had to project a different key name with the value of _id, either in user collection or in movies lookup query.

Why Unwind is useful? Unwind produces a straightforward array which you can use for another lookup query with multiple conditions. The query would be more readable.

5. Group

Unwind breaks the array into objects and "group" query groups the objects based on a condition and a key. Group is very important and widely used query because group is also used to count number of objects inside an array. You can find number of examples here in official mongoDB docs -

https://docs.mongodb.com/manual/reference/operator/aggregation/group/

But I am also providing an example of how group can help an unwinded query.

{
    $group: {
      _id: "$userId",
      userName: { $first: "$userName" },
      favoriteMovies: {
        $push: {
            movie_name: "$favoriteMovies.name",
        }
      }
    }
 }

Code-ex - 10

Let me explain exactly what’s happening here. From ‘Users’ collection I could retrieve any keys and from there I have taken userName. (See Code-ex - 9). Now I am grouping with “userId”. So objects with similar “userId” will be grouped. And then a problem will occur. While grouping it will find same valued keys in all objects. What to take. So you will have to manually define that if you group objects based on a key, you have to specify from which object the value will be inherited. Since “userName” key will be similar in all objects, I am taking the value of “userName” only from the first object. And then I am making an array of “favouriteMovies” based on “userId”. Using $push operator to create the array.

So that’s how you group an unwinded response in mongoDb aggregator.

There are a long list of other exciting things in “Aggregate” method and I want to discuss all I know. But not in a single article. For that, subscribe the blog and then come back for part-2, part-3. Stay tuned. Share it with your fellow developers.