Improve MongoDB query in Python

Question

I have a MongoDB database including hiking trails that I visualize in a Jupyter Notebook. Each trail has a start and destination and includes a timestamp when it was updated. So there are multiple entries for one start-destination pair.

I load all distinct trails in a list and iterate over it. For each trail I then want to find out

how many updates there have been
what the most recent date is that the trail was updated.

So I query the distinct timestamps and calculate their length like this:

nrUpdates = len(mycollection.distinct("mytimestamp", {"trailName": trail})

And to get the most recent update I then do:

mycollection.find({"trailName": trail}), {"mytimestamp":1, "_id":0}).sort("timestamp", -1).limit(1)

Is there any way to query the database in a more efficient way? I have to iterate several thousand trails and the performance is currently not high. Of course my queried fields are indexed.

how about taking the max timestamp rather than sorting and taking the last? Check out [Efficiently find the most recent filtered document in MongoDB collection using datetime field](https://stackoverflow.com/questions/70537861/efficiently-find-the-most-recent-filtered-document-in-mongodb-collection-using-d) — JonSG, Aug 01 '23 at 13:56
do you mean querying all timestamps and then calculating the max? I will have a look if pymongo provides a functionality to query the max value. Good hint, thank you! — Robbert, Aug 01 '23 at 13:58

score 1 · Accepted Answer · answered Aug 01 '23 at 17:24

1

If I understand what you are trying to do:

Number of updates for each "trail" (number of docs for each "trail"?)
Most recent update for each "trail" ("$max" "mytimestamp"?)

... and inferring your data model, this aggregation pipeline may be what you are looking for.

db.mycollection.aggregate([
  {
    "$group": {
      "_id": "$trail",
      "updateCount": {
        "$count": {}
      },
      "mostRecentUpdate": {
        "$max": "$mytimestamp"
      }
    }
  }
])

Example output (array of objects like this):

[
  ...
  {
    "_id": "Trois Pistoles",
    "mostRecentUpdate": ISODate("2023-07-31T04:41:24Z"),
    "updateCount": 30
  },
  ...
]

Try it on mongoplayground.net.

answered Aug 01 '23 at 17:24

rickhg12hs

10,638
6
24
42

thanks, worked perfectly fine! Can I additionally get the field "user" and "comment" for the mostRecentUpdate? So for each trail multiple documents are available that can be distinguished by the "user". It also provides a field "comment". Can I get those information within the same aggregate function? Something like "user1": "comment xy", "user2": "comment xyz", and so on? – Robbert Aug 02 '23 at 10:09
@Robbert I'm not sure I understand your document model, nor exactly what you want to return from the aggregation pipeline. My first instinct is, yes you can get that info, but rather than me guessing about your documents and desired output, I think it would be better if you ask a new question that explains everything. – rickhg12hs Aug 02 '23 at 10:32
Thank you, I did that. The question I posted here: https://stackoverflow.com/questions/76819766/mongodb-aggregation-get-values-of-multiple-documents-for-last-timestamp. I would gratefully appreciate your help! – Robbert Aug 02 '23 at 12:13

Improve MongoDB query in Python

1 Answers1

Linked