This is a follow up question on my original post here: Improve MongoDB query in Python
My collection of hiking trails looks like this: Each trail might be available multiple times for the same date based on comments set by users.
{
"_id": ..,
"trailName": "Trail1"
"update": ISODate("2023-07-30T08:00:00Z"),
"user": "Tom",
"comment": "xy"
},
{
"_id": ..,
"trailName": "Trail1"
"update": ISODate("2023-07-31T08:00:00Z"),
"user": "Mary",
"comment": "xy"
},
{
"_id": ..,
"trailName": "Trail1"
"update": ISODate("2023-07-31T08:00:00Z"),
"user": "Joe",
"comment": "xyz"
},
{
"_id": ..,
"trailName": "Trail2"
"update": ISODate("2023-07-10T08:00:00Z"),
"user": "Tom",
"comment": "xy"
}
In python I query all trail names and iterate them. For each trail I want to query:
- It's name
- The number of elements that are available (updateCount)
- The last update (mostRecentUpdate)
- All users and their comments for the last update
So far I could query everything except getting the users and the comments. My query looks like this:
db.mycollection.aggregate([
{
"$group": {
"_id": "$trailName",
"updateCount": {
"$count": {}
},
"mostRecentUpdate": {
"$max": "$update"
}
}
}
])
As a result I get back:
{
"_id": "Trail1",
"updateCount": 2,
"mostRecentUpdate": ISODate("2023-07-31T08:00:00Z")
},
{
"_id": "Trail2",
"updateCount": 1,
"mostRecentUpdate": ISODate("2023-07-10T08:00:00Z")
}
What I would like to get back is something like this:
{
"_id": "Trail1",
"updateCount": 2,
"mostRecentUpdate": ISODate("2023-07-31T08:00:00Z"),
"users": "Mary", "Joe"
"comments": "xy", "xyz"
},
{
"_id": "Trail2",
"updateCount": 1,
"mostRecentUpdate": ISODate("2023-07-10T08:00:00Z")
"users": "Tom"
"comments": "xy"
}
How can I achieve this to get ONLY the users and comments for the most recent date? When I add in the $group step something like:
"comments": { "$push": { "comment": "$comment", "user": "$user" } }
I will get all comments and users from ALL dates. But I only want it for the last date. I could not set a condition or filter for that. Also, I cannot first filter only the most recent date, because I need to get the number of updates that have been.
So my question is: How can I set a filter to get the comments for the most recent date?
I need to accomplish this task using PyMongo.