How do I get a projection from a MongoDB database quickly with pymongo?

Question

I have a collection of 286484 documents, collection, each one of which contains many fields, but in particular they all contain a title and a pmid. I wish to get a dictionary that maps pmids to titles.

I expected this to be approximately instantaneous given the moderate amount of data. Instead, the below code reports a runtime of approximately 330.1 sec.

start = time.perf_counter()
papers = collection.find(projection={"_id": False, "title": True, "pmid": True})
papers2 = {paper["pmid"]: paper["title"] for paper in papers}
stop = time.perf_counter()
print(f"elapsed time: {stop - start} sec")

Why does this take so long, and how do I speed it up?

Other relevant facts:

I'm running in Python 3.7.6 on linux with pymongo 3.12.1 and MongoDB 4.4.0.
I've verified that the projection works correctly (i.e. it returns pmid and title and nothing else).
This is all running on a single cloud machine (i.e. database and code on same machine, no sharding). It's not particularly high powered, but there's free memory and no other simultaneous users.
pmid is indexed; title is not.
explain doesn't really help here because there's no filter. The winningPlan is PROJECTION_SIMPLE and there are no rejectedPlans. A possible clue: calling explain on the cursor took 440 sec.

Since you are reading all documents in a collection, the read is slower. Generally, index become useful with filter (conditions) and sort operations. — prasad_, Oct 21 '21 at 13:52
I agree, but this is scanning at only 1000 items a second. That's really slow. — ramcdougal, Oct 21 '21 at 13:53
Here is some relevant information in this SO post: [MongoDB Indexing and Projection](https://stackoverflow.com/questions/27286908/mongodb-indexing-and-projection). — prasad_, Oct 21 '21 at 13:55
Your time counter measurement may not be correct about measuring the query performance. Query is executed on the database server - that doesn't include time to send the query to the server, bring back query results back, process it, etc. — prasad_, Oct 21 '21 at 14:00

How do I get a projection from a MongoDB database quickly with pymongo?

0 Answers0