0

I have a collection of 286484 documents, collection, each one of which contains many fields, but in particular they all contain a title and a pmid. I wish to get a dictionary that maps pmids to titles.

I expected this to be approximately instantaneous given the moderate amount of data. Instead, the below code reports a runtime of approximately 330.1 sec.

start = time.perf_counter()
papers = collection.find(projection={"_id": False, "title": True, "pmid": True})
papers2 = {paper["pmid"]: paper["title"] for paper in papers}
stop = time.perf_counter()
print(f"elapsed time: {stop - start} sec")

Why does this take so long, and how do I speed it up?

Other relevant facts:

  • I'm running in Python 3.7.6 on linux with pymongo 3.12.1 and MongoDB 4.4.0.
  • I've verified that the projection works correctly (i.e. it returns pmid and title and nothing else).
  • This is all running on a single cloud machine (i.e. database and code on same machine, no sharding). It's not particularly high powered, but there's free memory and no other simultaneous users.
  • pmid is indexed; title is not.
  • explain doesn't really help here because there's no filter. The winningPlan is PROJECTION_SIMPLE and there are no rejectedPlans. A possible clue: calling explain on the cursor took 440 sec.
ramcdougal
  • 2,307
  • 1
  • 13
  • 24
  • Since you are reading all documents in a collection, the read is slower. Generally, index become useful with filter (conditions) and sort operations. – prasad_ Oct 21 '21 at 13:52
  • I agree, but this is scanning at only 1000 items a second. That's really slow. – ramcdougal Oct 21 '21 at 13:53
  • Here is some relevant information in this SO post: [MongoDB Indexing and Projection](https://stackoverflow.com/questions/27286908/mongodb-indexing-and-projection). – prasad_ Oct 21 '21 at 13:55
  • Your time counter measurement may not be correct about measuring the query performance. Query is executed on the database server - that doesn't include time to send the query to the server, bring back query results back, process it, etc. – prasad_ Oct 21 '21 at 14:00

0 Answers0