3

i am trying to use aggregate framework in mongo for some data stats. the query i am using, when run on local is hardly taking a a minute but when i run the same query on server it does not give response and after keep on waiting for too long , i had to cancel it, can anyone please suggest why is this happening.

var orderIds = db.delivery.find({"status":"DELIVERED"}).map(function(o) {
    return o.order 
}); 

var userIds =  db.order.aggregate([{
    $match : { _id : { $in : orderIds } }
}, {
    $group: { _id : "$customer" }
}]).map(function(u) { return u._id });

var userstats = db.order.aggregate([{
    $sort : { customer : 1, dateCreated : 1 }
}, {
    $match : { status : "DELIVERED", customer : {  $in : userIds } }
}, { 
    $group: {
        _id : "$customer", orders : { $sum : 1 }, 
        firstOrderDate : { $first : "$dateCreated" },
        lastOrderDate : { $last : "$dateCreated" }
    }
}]);

userstats.forEach(function(x) { 
    db.user.update({ _id : x._id }, {
        $set : { 
            totalOrders : x.orders,
            firstOrderDate : x.firstOrderDate,
            lastOrderDate : x.lastOrderDate
        }
    })
})

I am not sure , but shouldn't it be more fast on server ? , but instead its not able to give output.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
user29578
  • 689
  • 7
  • 21

2 Answers2

3

I recommend you make $match the first operation in your pipeline as the $match operator can only use an index if it is first in the aggregation pipeline:

var userstats = db.order.aggregate([{
    $match : {
        status :"DELIVERED", 
        customer : { $in : userIds }
    }
}, {
    $sort : {
        customer : 1,
        dateCreated : 1
    }
}, { 
    $group : {  
        _id : "$customer",
        orders : { $sum : 1 }, 
        firstOrderDate: { $first : "$dateCreated" },
        lastOrderDate : { $last:"$dateCreated" }
    }
}]);

You should also add an index on status and customer if you have not already defined one:

db.delivery.createIndex({status:1,customer:1})
John Slegers
  • 45,213
  • 22
  • 199
  • 169
Alex
  • 21,273
  • 10
  • 61
  • 73
3

To speed up the process you could refactor your operations in a couple of ways. The first would be to eliminate unnecessary pipeline operations like the $sort operator which could be replaced with the $max and $min operators within the $group pipeline.

Secondly, use the bulk() API which will increase perfromance on update operations especially when dealing with large collections since they will be sending the operations to the server in batches (for example, say a batch size of 500) unlike sending every request to the server (as you are currently doing with the update statement within the forEach() loop).

Consider the following refactored operations:

var orderIds = db.delivery.find({"status": "DELIVERED"}).map(function(d){return d.order;}),
    counter = 0,
    bulk = db.user.initializeUnorderedBulkOp();

var userstatsCursor = db.orders.aggregate([
    { "$match": { "_id": { "$in": orderIds } } },
    { 
        "$group": { 
            "_id": "$customer", 
            "orders": { "$sum": 1 },
            "firstOrderDate": { "$min": "$dateCreated" },
            "lastOrderDate":{ "$max": "$dateCreated" } } 
        } 
    }
]);

userstatsCursor.forEach(function (x){
    bulk.find({ "_id": x._id }).updateOne({ 
        "$set": { 
            "totalOrders": x.orders,
            "firstOrderDate": x.firstOrderDate,
            "lastOrderDate": x.lastOrderDate
        }
    });

    counter++;
    if (counter % 500 == 0) {
        bulk.execute(); // Execute per 500 operations and 
        // re-initialize every 500 update statements
        bulk = db.user.initializeUnorderedBulkOp();
    }
});

// Clean up remaining operations in queue
if (counter % 500 != 0) { bulk.execute(); }
chridam
  • 100,957
  • 23
  • 236
  • 235
  • the solution looks good, i can see a lot of new things, but the query is somehow throwing the syntax error "Unexpected token }" – user29578 Feb 12 '16 at 16:52
  • 1
    @user29578 There was a typo, I removed an extra closing brace before the pipeline array in my updated answer. Give that a try again. – chridam Feb 12 '16 at 16:56
  • hi chridam , i am getting this error 2016-02-14T02:32:33.615+0530 TypeError: Object function () { return new Bulk(this, false); } has no method 'find' at (shell):1:436 at Object.DBQuery.forEach (src/mongo/shell/query.js:281:9) at (shell):1:409 – user29578 Feb 13 '16 at 21:05
  • @user29578 Which MongoDB version are you using and which method did you try? – chridam Feb 14 '16 at 08:02
  • i am using 2.7.2, i am trying to run the same query from your answer , but it is throwing this error for find method on bulk, dont know why – user29578 Feb 14 '16 at 15:46
  • @user29578 Another typo once again, initializing the `bulk` instance was typed wrongly, it should be `bulk = db.user.initializeUnorderedBulkOp();` instead of `bulk = db.user.initializeUnorderedBulkOp;`, check updated answer, my apologies. – chridam Feb 14 '16 at 15:49
  • correct me please for this, when i use no of operation 500 then it shows nMatched:113 and for no of operation 1000 , it shows 613,for different values of counter i get different nMatched, nMatched is number of documents updates, should it be same everytime ? – user29578 Feb 14 '16 at 17:57
  • from the documentation it says that no of update operations is nMatched, so for me i am updating all users with delivered orders, which are somewhere around 30k, can you please tell me why is it showing such less number and that too different with bulk size ? – user29578 Feb 14 '16 at 18:28
  • @user29578 Please create a new question for this – chridam Feb 14 '16 at 18:29
  • created question, please check link http://stackoverflow.com/questions/35395630/bulkwriteresult-in-mongo-nmatched-and-no-of-documents-does-not-match – user29578 Feb 14 '16 at 18:42
  • hi, can i remind you for this question , appreciate if you can help.. http://stackoverflow.com/questions/35395630/bulkwriteresult-in-mongo-nmatched-and-no-of-documents-updated-does-not-match – user29578 Feb 15 '16 at 20:35