I have a MongoDB database including hiking trails that I visualize in a Jupyter Notebook. Each trail has a start and destination and includes a timestamp when it was updated. So there are multiple entries for one start-destination pair.
I load all distinct trails in a list and iterate over it. For each trail I then want to find out
- how many updates there have been
- what the most recent date is that the trail was updated.
So I query the distinct timestamps and calculate their length like this:
nrUpdates = len(mycollection.distinct("mytimestamp", {"trailName": trail})
And to get the most recent update I then do:
mycollection.find({"trailName": trail}), {"mytimestamp":1, "_id":0}).sort("timestamp", -1).limit(1)
Is there any way to query the database in a more efficient way? I have to iterate several thousand trails and the performance is currently not high. Of course my queried fields are indexed.