1

Per Mongo's latest $group documentation, there is a special optimization for $first:

Optimization to Return the First Document of Each Group

If a pipeline sorts and groups by the same field and the $group stage only uses the $first accumulator operator, consider adding an index on the grouped field which matches the sort order. In some cases, the $group stage can use the index to quickly find the first document of each group.

It makes sense, since only the first entry in an ordered index should be needed for each bin in the $group stage. Unfortunately, in my testing, I've gotten a query that renders ~800k sorted records in about 1s, then passes them to $group, where it takes about 10s to render the 1.7k output docs for some values of key (see example below). For other values of key, it times out at 300s. There should be exactly 1704 bins in the group regardless of key, and those query bins should be covered by the first three entries in the index, as near as I can tell. Am I missing something?

db.getCollection('time_series').aggregate([
    {
        '$match': {
            'organization_id': 1,
            'key': 'waffle_count'
        }
    },
    {
        '$sort': {
            'key': 1, 'asset_id': 1, 'date_time': - 1
        }
    },
    {
        '$group': {
            '_id': {
                'key': '$key', 'asset_id': '$asset_id'
            },
            'value': {
                '$first': '$value'
            }
        }
    }
]);

Here is the index:

{
    "organization_id": 1,
    "key": 1,
    "asset_id": 1,
    "date_time": -1
}
Derek
  • 1,466
  • 15
  • 24
  • What does `.explain` say? Also it depends whether you have compound index or single field index – Ashh Apr 22 '20 at 21:39
  • It is difficult to analyze your query with available info. Due to the nature of compund indexes and index prefixes, the example of "Optimization to Return the First Document of Each Group" may not apply to your case. As such even the documentation suggests, "may be" (my own words). But, you can try the following compound indexes, _individually_, and see what the results for the _query_ as well as the _query plan_ are: `{ "organization_id": 1, "key": 1, "asset_id": 1, "date_time": -1, "value": 1 }` _and_ `{ "key": 1, "asset_id": 1, "date_time": -1, "value": 1, "organization_id": 1 }`. – prasad_ Apr 23 '20 at 03:23
  • @Ashh, .explain is how I knew that everything through the sort runs in about 1s, returning ~800k sorted entries from the index, and that the remainder of the query takes 10s under the best circumstances. – Derek Apr 23 '20 at 12:32

2 Answers2

1

I sent a request to Atlas's MongoDB Support. The optimization that I quoted isn't available until version 4.2 (we are using 3.6). Quoting Atlas Support:

The enhancement that you're mentioning was implemented in 4.2 via SERVER-9507. For your particular example, it seems you may also need SERVER-40090 to be implemented in order for your pipeline to fully take advantage of the improvement. We will let the team know of its potential benefit for your specific situation.

As of now, the second issue is not fixed and requires a simple $group _id setup like:

'_id': 'asset_id': '$asset_id'

Whereas a key specified as an object will fail to use the index, even if it is not a composite key, like so:

'_id': { 'asset_id': '$asset_id' }
Derek
  • 1,466
  • 15
  • 24
  • Thanks for letting us know what the vendor support folks told you. – Joe Apr 23 '20 at 18:15
  • Quick update - we updated to MongoDB 4.2 and got about 10x improvement in speed due to this optimization, even without the second fix yet. – Derek Aug 04 '20 at 20:30
0

I am almost running into similar situation where we have a pipeline of match,sort and group in same order . while match and sort stage are able to use index group is not using index even with 4.2 . Even after https://jira.mongodb.org/browse/SERVER-40090 is implemented i dont think it will allow compound key on group _id . For e.g

'_id': { 'asset_id': '$asset_id' }

^^ will be supported

'_id': {'key': '$key', 'asset_id': '$asset_id'}

However i don't think compound _id on group will be able to use index like in example above ^^

Abhi
  • 1
  • 1