29

I'm running into the aggregation result exceeds maximum document size (16MB) error with mongodb aggregation using pymongo.

I was able to overcome it at first using the limit() option. However, at some point I got the

Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in." error.

Ok, I'll use the {'allowDiskUse':True} option. This option works when I use it on the commandline, but when I tried to use in my python code

result = work1.aggregate(pipe, 'allowDiskUse:true')

I get TypeError: aggregate() takes exactly 2 arguments (3 given) error. (that's in spite of the definition given at http://api.mongodb.org/python/current/api/pymongo/collection.html#pymongo.collection.Collection.aggregate: aggregate(pipeline, **kwargs)).

I tried to use runCommand, or rather it's pymongo equivalent:

db.command('aggregate','work1',pipe, {'allowDiskUse':True})

but now I'm back to the 'aggregation result exceeds maximum document size (16MB)' error

In case you need to know

pipe = [{'$project': {'_id': 0, 'summary.trigrams': 1}}, {'$unwind': '$summary'}, {'$unwind': '$summary.trigrams'}, {'$group': {'count': {'$sum': 1}, '_id': '$summary.trigrams'}}, {'$sort': {'count': -1}}, {'$limit': 10000}]

Thank you

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
David Makovoz
  • 1,766
  • 2
  • 16
  • 27

1 Answers1

60

So, in order:

  • aggregate is a method. It takes 2 positional arguments (self, which is implicitly passed, and pipeline) and any number of keyword arguments (which must be passed as foo=bar -- if there's no = sign, it's not a keyword argument). This means you need to call result = work1.aggregate(pipe, allowDiskUse=True).

  • Your error about maximum document size is inherent to Mongo. Mongo can never return a document (or array thereof) larger than 16 megabytes. I can't tell you why because you have given us neither your data nor your code, but it probably means that the document you're building as an end result is too large. Try decreasing the $limit parameter, maybe? Start by setting it to 1, run a test, then increase it and look at how big the result gets when you do that.

Max Noel
  • 8,810
  • 1
  • 27
  • 35
  • 5
    >>work1.aggregate(pipe, allowDiskUse=True). That did the trick, perfect, thank you – David Makovoz Dec 03 '14 at 14:17
  • 1
    @Max Noel After adding the `allowDiskUse=True` I don't see the bson size 16mb limit issue anymore, however, I got another size error from the pymongo side: `raise DocumentTooLarge("command document too large") pymongo.errors.DocumentTooLarge: command document too large` Have you experienced this? – Sam Feb 09 '18 at 04:37