Per Mongo's latest $group documentation, there is a special optimization for $first:
Optimization to Return the First Document of Each Group
If a pipeline sorts and groups by the same field and the $group stage only uses the $first accumulator operator, consider adding an index on the grouped field which matches the sort order. In some cases, the $group stage can use the index to quickly find the first document of each group.
It makes sense, since only the first entry in an ordered index should be needed for each bin in the $group stage. Unfortunately, in my testing, I've gotten a query that renders ~800k sorted records in about 1s, then passes them to $group, where it takes about 10s to render the 1.7k output docs for some values of key
(see example below). For other values of key
, it times out at 300s. There should be exactly 1704 bins in the group regardless of key
, and those query bins should be covered by the first three entries in the index, as near as I can tell. Am I missing something?
db.getCollection('time_series').aggregate([
{
'$match': {
'organization_id': 1,
'key': 'waffle_count'
}
},
{
'$sort': {
'key': 1, 'asset_id': 1, 'date_time': - 1
}
},
{
'$group': {
'_id': {
'key': '$key', 'asset_id': '$asset_id'
},
'value': {
'$first': '$value'
}
}
}
]);
Here is the index:
{
"organization_id": 1,
"key": 1,
"asset_id": 1,
"date_time": -1
}