1

I am trying to query a list of documents from a collection using pymongo. The number of documents is usually above the 10k. After that I need to create a dictionary with the field id as a key and this is the snippet I use to do it.

t1 = time.time()
mydocs = self.mypoint.find(myquery)
t2 = time.time()
print(f"[diagnostic] Time for query: {t2 - t1}")
res = dict()
count_points = 0
for mydoc in mydocs:
    count_points += 1
    res[mydoc["id"]] = mydoc
t3 = time.time()
print(f"[diagnostic] Time to reset dictionary: {t3-t2} - n_points = {count_points}")

The find works super fast while the for is extremely slow, so i wonder if there is a faster way to do that.

Thanks in advance, P

lateautumntear
  • 440
  • 4
  • 13
  • The timing is probably deceptive. When you call `find` it doesn't return all of the results, it just creates a cursor and returns the first batch. It will request additional batches from the server as you iterate the cursor. – Joe Aug 26 '21 at 10:18
  • So, I shall set a larger batch_size in the first place when calling `find`? – lateautumntear Aug 26 '21 at 11:26
  • 1
    Try fetching all of the documents like [this](https://stackoverflow.com/a/8723941/2282634), then you'll be able to separate the `find` time from the iterate time more accurately. – Joe Aug 26 '21 at 11:28
  • I tried this, it is very slow as well. – lateautumntear Aug 26 '21 at 11:52
  • but when using that, is it the find that is slow, or the for loop? – Joe Aug 26 '21 at 11:56
  • If I break in 2 lines the find and the list, the find is super fast while the cast to list is super slow. It is comparable to the for in my snippet – lateautumntear Aug 26 '21 at 12:08
  • exactly, because the super-fast find is only returning a cursor. Try using the explain database command to find out more about how the query runs. – Joe Aug 26 '21 at 12:17

0 Answers0