4

the line for doc in collection.find({'is_timeline_valid': True}): is giving the Message Length error. How can I get all the collection without the error? I know about the find().limit() but I don't know how to use it.

Code:

from openpyxl import load_workbook
import pymongo
import os

wb = load_workbook('concilia.xlsx')
ws = wb.active
client = pymongo.MongoClient('...')
db = client['...']
collection = db['...']

r = 2
for doc in collection.find({'is_timeline_valid': True}):
   for dic in doc['timeline']['datas']:
     if 'concilia' in dic['tramite'].lower():
        ws.cell(row = r, column = 1).value = doc['id_process_unformatted']
        ws.cell(row = r, column = 2).value = dic['data']
        ws.cell(row = r, column = 3).value = dic['tramite']
        wb.save('concilia.xlsx')
        print('*****************************')
        print(dic['tramite'])
        # print('check!')
        r += 1
André
  • 61
  • 4

3 Answers3

1

Here is a simple paginator that splits the query execution into paginated queries.

from itertools import count

class PaginatedCursor(object):
    def __init__(self, cur, limit=100):
        self.cur = cur
        self.limit = limit
        self.count = cur.count()

    def __iter__(self):
        skipper = count(start=0, step=self.limit)

        for skip in skipper:
            if skip >= self.count:
                break

            for document in self.cur.skip(skip).limit(self.limit):
                yield document

            self.cur.rewind()

...
cur = collection.find({'is_timeline_valid': True})
...
for doc in PaginatedCursor(cur, limit=100):
   ...
Oluwafemi Sule
  • 36,144
  • 1
  • 56
  • 81
1

I ran into this problem today, and it turns out it has to do with the size of a particular document within the collection exceeding the max_bson_size limit. When adding documents to the collection, make sure the document size doesn't exceed max_bson_size size.

document_size_limit = client.max_bson_size
assert len(json.dumps(data)) < document_size_limit

I'm currently investigating why the collection allowed a document larger than max_bson_size in the first place.

Harvinder
  • 274
  • 3
  • 15
  • I ran into the same issue but found client.max_message_size gave me the correct upper bound (~4MB whereas max_bson_size is ~16MB). – Patrick Jun 19 '19 at 16:47
  • Did you find an explanation to this? Stuck with the same issue... – Gustav Eiman Sep 03 '20 at 08:52
  • Yeah, issue for me was one of the fields in the doc were too big. Not problematic during insertion but fails when querying - probably due to some assertions in query logic that don't exist at insertion (weird). Some solutions are 1) compression 2) store big text field in blob store and store reference in doc. I started with 1, but recently switched to 2 because I don't actually read the text blob that often. – Harvinder Sep 03 '20 at 12:44
1

we can add batch_size to find() to reduce the message size.

for doc in collection.find({'is_timeline_valid': True}):

becomes

for doc in collection.find({'is_timeline_valid': True}, batch_size=1):
Patrick
  • 4,186
  • 9
  • 32
  • 45