0

I have a collection with >100,000 documents that contain multiple nested arrays. I need to Query based on a property located at the lowest level and return just the object at the bottom of the arrays.

Document Structure:

    {
    _id: 12345,
    type: "employee",
    people: [
        {
            name: "Rob",
            items: [
                {
                    itemName: "RobsItemOne",
                    value: "$10.00",
                    description: "some description about the item"
                },
                {
                    itemName: "RobsItemTwo",
                    value: "$15.00",
                    description: "some description about the item"
                }
            ]
        }
    ]
}

I have been using the Aggregation Pipeline to get the expected results which DOES WORK, however the performance is pretty terrible. Here is my query:

db.collection.aggregate([
            {
                $match: {
                    "type": "employee"
                }
            },

            {$unwind: "$people"},
            {$unwind: "$people.items"},
            {$match: {$or: [ //There could be dozens of items included in this $match
                             {"people.items.itemName": "RobsItemOne"},
                             {"people.items.itemName": "RobsItemTwo"}
                           ]
                     }
            },
            {
                $project: {
                    _id: 0,// This is because of the $out
                    systemID: "$_id",
                    type: "$type",
                    item: "$people.items.itemName",
                    value: "$people.items.value"
                }
            },
            {$out: tempCollection} //Would like to avoid this, but was exceeding max document size
        ])

The Results are :

[ 
    {
        "type" : "employee",
        "systemID" : 12345,
        "item" : "RobsItemOne",
        "value" : "$10.00"
    }, 
    {
        "type" : "employee",
        "systemID" : 12345,
        "item" : "RobsItemTwo",
        "value" : "$10.00"
    }
]

What can I do to make this query faster? I have tried using indexes but per Mongo docs, indexes past the initial $match are ignored.

Community
  • 1
  • 1
Rob
  • 11,185
  • 10
  • 36
  • 54

2 Answers2

0

What you can additionally try, is to add a $match operator to your query just after you $unwind people.

...{$unwind: "$people"},
{$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}},
{$unwind: "$people.items"}, ....

This will bring down the number of records to be queried by the following $unwind and $match operators.

Since you have a large number of records, you could make use of the {allowDiskUse:true} option.which,

Enables writing to temporary files. When set to true, aggregation stages can write data to the _tmp subdirectory in the dbPath directory.

So, your final query would like like:

db.collection.aggregate([
        {
            $match: {
                "type": "employee"
            }
        },

        {$unwind: "$people"},
        {$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}},
        {$unwind: "$people.items"},
        {$match: {$or: [ //There could be dozens of items included in this $match
                         {"people.items.itemName": "RobsItemOne"},
                         {"people.items.itemName": "RobsItemTwo"}
                       ]
                 }
        },
        {
            $project: {
                _id: 0,// This is because of the $out
                systemID: "$_id",
                type: "$type",
                item: "$people.items.itemName",
                value: "$people.items.value"
            }
        }

    ], {allowDiskUse:true})
BatScream
  • 19,260
  • 4
  • 52
  • 68
  • I will give that a shot. Is Aggregation pipeline instead of Map Reduce the proper choice here? – Rob Oct 27 '14 at 19:42
  • Please have a look at this :http://stackoverflow.com/questions/16310730/mongodb-use-aggregation-framework-or-mapreduce-for-matching-array-of-strings-w – BatScream Oct 27 '14 at 23:53
  • In the above example, all the documents have unique keys, so the reduce function would not be called for all the documents. Even if you emit a common key to all the documents, the reduce function would have to take a lot of documents as input, the processing would be much slower than the aggregation pipeline since the pipeline would eliminate documents in the $match stage. – BatScream Oct 28 '14 at 00:07
0

I find there are something else can be strived to improve after @BatScream's efforts. You can just have a try.

// if the final result set is relatively small, this index will be helpful.
db.collection.ensureIndex({type : 1, "people.items.itemName" : 1 });

var itemCriteria = {
    $in : [ "RobsItemOne", "RobsItemTwo" ]
};

db.collection.aggregate([ {
    $match : {
        "type" : "employee",
        "people.items.itemName" : itemCriteria      // add this criteria to narrow source range further
    }
}, {
    $unwind : "$people"
}, {
    $match : {
        "people.items.itemName" : itemCriteria      // narrow data range further
    }
}, {
    $unwind : "$people.items"
}, {
    $match : {
        "people.items.itemName" : itemCriteria      // final match, avoid to use $or operator
    }
}, {
    $project : {
        _id : 0,                                    // This is because of the $out
        systemID : "$_id",
        type : "$type",
        item : "$people.items.itemName",
        value : "$people.items.value"
    }
}, {
    $out: tempCollection                            // optional
} ], {
    allowDiskUse : true
});
Wizard
  • 4,341
  • 1
  • 15
  • 13