Mongo aggregation framework on big data

Question

Could you please help me with mongoDB aggregation. Here is what I would like to do next:

I have collection A. A document from A represents an object like:

{
  nameA: 'first',
  items: [
    'item1',
    'item2',
    'item3',
    'item4'
  ]
}

And I have the Collection B with documents like:

[
    {
      item: 'item3',
      info: 'info1'
    },
    {
      item: 'item3',
      info: 'info2'
    },
    {
      item: 'item3',
      info: 'info3'
    }
]

I work with big data, so it would be better to do it in one query. Imagine that we already have all data from collection A. I would like to build a query on collection B to get next structure result:

{
    'first'/*nameA*/: ['info1', 'info2', 'info3'],
    ....
}

How do I achieve the desired result with MongoDB aggregation?

This is not very helpful design, note that Mongo doesn't allow joins, there is $lookup to manipulate like left join, you can check out the documentation for $lookup in aggregation pipeline. — Rahul Kumar, Sep 23 '16 at 07:24

score 1 · Answer 1 · answered Sep 23 '16 at 08:36

As Rahul Kumar mentioned in his comment, your design is more leaning towards a relational database schema design, and it makes it quite difficult to design efficient MongoDB it.

However, it is still possible to achieve the functionality you are looking for by leveraging the $lookup stage of the aggregation framework, as follows:

db.A.aggregate([
    {
        $unwind: {
            path: "$items"
        }
    },
    { 
        $lookup: {
            from: "B",
            localField: "items",
            foreignField: "item",
            as: "item_info"
        }
    },

    {
        $unwind: {
            path: "$item_info"
        }
    },

    {
        $group: {
            _id: "$nameA",
            item_info: { $addToSet: "$item_info.info" }
        }
    }
]);

In the first $unwind stage you normalize the items array on collection A in order to be able to pass its output to the next stage
In the $lookup stage you make a left join between two collections that are part of the same database, in this case used to get the item information from collection B
In the second $unwind stage you normalize the data you extracted from collection B in order to flatten the array containing the objects from collection B that were mapped to the corresponding items in collection A
Finally, in the $group stage you group all the entries of the result set by nameA and create an array of unique item information values. If you would like to have all the duplicate occurrences of the item information values, you can replace the $addToSet accumulator with $push.

Below is the result of running the above aggregation pipeline on the collections that you provided:

{ "_id" : "second", "item_info" : [ "info3", "info2", "info1" ] }
{ "_id" : "first", "item_info" : [ "info3", "info2", "info1" ] }

Yep :) I understand it would be better to use relational database schema design, but it is already written. Really appreciate your help! Thank you very much! — Александр Янковский, Sep 23 '16 at 11:08

Mongo aggregation framework on big data

1 Answers1