1

I have data stored in MongoDB in the following format.

{
    "_id" : ObjectId("570b487fb5360dd1e5ef840c"),
    "internal_id" : 1,
    "created_at" : ISODate("2015-07-14T10:08:38.994Z"),
    "updated_at" : ISODate("2016-01-10T00:35:19.748Z"),
    "ad_account_id" : 1,
    "updated_time" : "2013-08-05T04:48:49-0700",
    "created_time" : "2013-08-05T04:46:35-0700",
    "name" : "Sale1",
    "daily": [
                 {"clicks": 5000, "date": "2015-04-16"},
                 {"clicks": 5100, "date": "2015-04-17"},
                 {"clicks": 5030, "date": "2015-04-20"}
             ]
    "custom_tags" : {
        "Event" : {
            "name" : "Clicks"
        },
        "Objective" : {
            "name" : "Sale"
        },
        "Image" : {
            "name" : "43c3fe7b262cde5f476ed303e472c65a"
        },
        "Goal" : {
            "name" : "10"
        },
        "Type" : {
             "name" : "None"
        },
        "Call To Action" : {
             "name" : "None",
        },
        "Landing Pages" : {
            "name" : "www.google.com",
    }
}

I am trying to group individual documents by internal_id to find the aggregate sum of clicks from say 2015-04-15 to 2015-04-21 using the aggregate method.

In pymongo, when I try to do an aggregate using just $project on internal_id, I get the results, but when I try to $project custom_tags fields, I get the following error:

OperationFailure: Exceeded memory limit for $group, but didn't allow external sort.
Pass allowDiskUse:true to opt in.

Following the answer here, I even changed my aggregate function to list(collection._get_collection().aggregate(mongo_query["pipeline"], allowDiskUse=True)). But this still keeps throwing the earlier error.

Community
  • 1
  • 1
maverick93
  • 143
  • 1
  • 4
  • 13

2 Answers2

3

Take a look at this link: Can't get allowDiskUse:True to work with pymongo

This Works for me:

someSampleList= db.collectionName.aggregate(pipeline, allowDiskUse=True)

Where

pipeline = [
    {'$sort': {'sortField': 1}},
    {'$group': {'_id': '$distinctField'}}, 
    {'$limit': 20000}]
-1

Try with that:

list(collection._get_collection().aggregate(mongo_query["pipeline"], {allowDiskUse : true}))

pakkk
  • 289
  • 1
  • 2
  • 13