3

I have a MongoDB Sharded Cluster hosting 250+ million documents.

Document structure is as follows:

{
    "app_id": "whatever", 
    "created": ISODate("2018-05-06T12:13:45.000Z"),
    "latest_transaction": ISODate("2019-03-06T11:11:40.000Z"),
    "anotherField1": "Str", "anotherField2": "Str", ...otherfields
}
{
    "app_id": "whatever", 
    "created": ISODate("2018-04-06T12:13:45.000Z"),
    "latest_transaction": ISODate("2019-03-06T11:11:40.000Z"),
    "uninstalled": ISODate("2019-03-07T11:11:40.000Z"),
    "anotherField1": "Str", "anotherField2": "Str", ...otherfields
}

So basically some documents have the field uninstalled, some don't.

The following is the query on the collection (It's pymongo's explain, sorry for the datetime.datetime s):

{
    '$and': [
        {'app_id': {'$eq': 'whatever'}},
        {'created': {'$lt': datetime.datetime(2019, 3, 7, 0, 0)}},
        {'latest_transaction': {'$gt': datetime.datetime(2019, 2, 5, 0, 0)}},
        {'$nor': [{'uninstalled': {'$lt': datetime.datetime(2019, 3, 7, 0, 0)}}]}
    ]
}

Here is the two relevant indices I have on the collection:

Index1: {"created": 1, "latest_transaction": -1, "uninstalled": -1, "app_id": 1}
Index2: {'app_id': 1, 'anotherField1': 1, 'anotherField2': 1}

Now the problem is, the MongoDb query planner never seems to pick the Index1 I have on the collection for this exact same purpose!

My initial impression was that the query will use a covered index with the way I've structured indices [hence, blazingly fast], but weird to me, mongodb is using Index2 and everything is too slow, taking 10 minutes+ sometimes, and usually around 6 minutes for a result set of 1.5 million documents [i.e. matching app_id has about 1.5 million docs].

Here's the output of explain on the query, showing rejected plan using "Index1"

{
    'inputStage': {
        'inputStage': {
            'direction': 'forward',
            'indexBounds': {
                'app_id': ['["whatever", "whatever"]'],
                'created': ['(true, new Date(1551916800000))'],
                'latest_transaction': ['[new Date(9223372036854775807), new Date(1549324800000))'],
                'uninstalled': ['[MaxKey, new Date(1551916800000)]', '[true, MinKey]']
            },
            'indexName': 'created_1_latest_transaction_-1_uninstalled_-1_app_id_1',
            'indexVersion': 2,
            'isMultiKey': False,
            'isPartial': False,
            'isSparse': False,
            'isUnique': False,
            'keyPattern': {
                'app_id': 1.0,
                'created': 1.0,
                'latest_transaction': -1.0,
                'uninstalled': -1.0
            },
            'multiKeyPaths': {'app_id': [], 'created': [], 'latest_transaction': [], 'uninstalled': []},
            'stage': 'IXSCAN'},
        'stage': 'FETCH'},
    'stage': 'SHARDING_FILTER'
}

And the following is the winning plan using irrelevant, uncovered, Index2:

{'inputStage': {
    'inputStage': {'direction': 'forward',
                   'indexBounds': {
                       'app_id': ['["whatever", "whatever"]'],
                       'anotherField1': ['[MinKey, MaxKey]'],
                       'anotherField2': ['[MinKey, MaxKey]']},
                   'indexName': 'app_id_1_anotherField2_1_anotherField1_1',
                   'indexVersion': 2,
                   'isMultiKey': False,
                   'isPartial': False,
                   'isSparse': False,
                   'isUnique': False,
                   'keyPattern': {'app_id': 1, 'anotherField1': 1, 'anotherField2': 1},
                   'multiKeyPaths': {'app_id': [], 'anotherField1': [], 'anotherField2': []},
                   'stage': 'IXSCAN'},
    'stage': 'FETCH'},
    'stage': 'SHARDING_FILTER'
}
  • Any ideas on why mongodb won't use my index correctly?
  • Is it because uninstalled might not be present in some docs?
  • Some explanations on the direction of the indices when doing compound date queries also would be greatly appreciated, maybe the reason is the index directions? (1, -1, -1, 1)

Thanks! :)

------------ EDIT --------------

Full result of the explain is a bit long so I've pasted it here, it explains the queryPlanner's choice of index (Index2).

Also about the shard_key, it's completely different than what's being queried here, that's why I'm defining a separate specific index only for this query. (shard key is a compound index on (app_id, android_id, some_other_field_not_in_query).

SpiXel
  • 4,338
  • 1
  • 29
  • 45

2 Answers2

4

Covered queries require proper projection - please ensure you ask to return only fields that are in the index. Specifically to sharded collections, the index also should contain the shard key: https://docs.mongodb.com/manual/core/query-optimization/#restrictions-on-sharded-collection.

You can get more details from explain using allPlansExecution parameter. It will show you how the planner run samples and why index2 wins.

https://github.com/mongodb/mongo/blob/master/src/mongo/db/query/plan_ranker.cpp#L191 is how the score is calculated:

baseScore = 1
productivity = advanced / works // the main one 

tieBreak = very_small_number
   + noFetchBonus // 0 for not covered queries
   + noSortBonus // 0 for no sort
   + noIxisectBonus // 0 for index intersection

score = baseScore + productivity + tieBreakers

It picks the plan with higher score on the first 100 docs returned (advanced) which usually gives a good idea of how it will work for the whole query. If you doubt it, try to hint the other index and check if it is any quicker.

UPDATE

shard key is a compound index on (app_id, android_id, some_other_field_not_in_query

Kinda explains it. app_id is a common prefix in both sharding key and Index2. It means that using this index mongo can instantly decide which shards to query. Change order of fields in the Index1 to match sharding key prefix:

Index1: {"app_id": 1, "created": 1, "latest_transaction": -1, "uninstalled": -1}

The essential numbers from explain:

   u'inputStage': {u'advanced': 0,
     u'indexName': u'created_1_latest_transaction_-1_uninstalled_-1_app_id_1',       


   u'inputStage': {u'advanced': 88,
     u'indexName': u'app_id_1_is_enabled_1_another_id_1',

   u'inputStage': {u'advanced': 12,
     u'indexName': u'app_id_1_uninstalled_1_is_enabled_1',

   u'inputStage': {u'advanced': 101,
     u'indexName': u'app_id_1_is_enabled_1_gaid_1',

The winner is app_id_1_is_enabled_1_gaid_1 because it managed to return 101 documents during evaluation. The one without matching prefix created_1_latest_transaction_-1_uninstalled_-1_app_id_1 is at least 100 times slower.

Alex Blex
  • 34,704
  • 7
  • 48
  • 75
  • Thanks for the info. Actually I don't really care about the query being covered. I just want it to be matched to the index I already have "on all the query fields"! Why wouldn't that work? I have a compound index covering all fields in that query, why wouldn't it use that index? Are the index directions wrongs? Or like the $nor query won't use the index? Any ideas ? – SpiXel Mar 08 '19 at 12:33
  • 1
    hmm, I doubt I can make the phrase "It picks the plan with higher score" more clear. If you provide missing details I can update the answer with exact numbers. Please include the full query (projection, sorting etc), sharding key index, result of explain with allPlansExecution parameter. Direction matters only if you have sorting. If you don't sort, order of fields in the compound index is more important. If you don't care why and just want to use the other index - force it with hint. – Alex Blex Mar 08 '19 at 13:07
  • I've provided the information you wanted in an edit. Basically there are no sorts, and I'm wondering for the case where there's also no projection. Explain is also for the simple `find(query).explain()`. It's interesting that direction doesn't matter at all if there's no sorts! Thanks, didn't know that. I'd be really glad if you could take a look at the explain and see if you can help me with this issue :) Thanks again – SpiXel Mar 08 '19 at 18:05
  • hmm, well, shard_key is compound (with 2 other fields), and I have 4 shards overall, so most probably, mongodb has to query all shards anyways. What I don't understand is the performance of the query in each shard, because now that you have to query all shards anyhow, why not use the better index matching all the filtered fields? Am I misinterpreting something? (I'm basically disagreeing with app_id being a shard_key prefix so it improves query performance, relying on the idea that it's a scatter gather anyways) – SpiXel Mar 08 '19 at 18:29
  • Well, `find(query).explain()` is not very useful. I was asking about `find(query).explain("allPlansExecution")` but I guess it is not necessary any more. I have updated the answer with explanation regarding sharding keys. The index with longer matching prefix will always win. And of course there is no need to talk about covered queries. It's not gonna happen with such shard key. – Alex Blex Mar 08 '19 at 18:31
  • O actually it's allPlansExecution, I just omitted that in the comment to be brief :) – SpiXel Mar 08 '19 at 18:31
  • https://docs.mongodb.com/manual/core/index-compound/#prefixes may explain how compound indices work – Alex Blex Mar 08 '19 at 18:37
  • I know how index prefixes work. The thing is that I'm not using a prefix in my query. i'm *completely matching the index (Index11)* in my query, and If planner chooses Index2, then yeah, it's matching only a prefix. Add that to the fact that the query is definitely going to be a scatter-gather query (would need to be executed on all shards anyways), having the app_id at the beginning shouldn't be that much important. (Why would it matter? It's knowing you have to target 4 shards using app_id prefix (Index2) vs not knowing and targeting all 4 shards (Index1). It shouldn't matter) I don't know – SpiXel Mar 08 '19 at 18:42
  • Are you saying that cardinality of app_id is 1? – Alex Blex Mar 08 '19 at 18:48
  • No it's not one. I'm targeting 1 app_id, but due to the compound shard_key, documents belonging to the single app_id, are scattered across all shards. I'm saying knowing app_id in adv. (Index2) shouldn't help, since you have to target all the shards anyways. Using the app_id index prefix, mongodb can only get to know it has to now target all shards. It's not like It's omitting 3 shards and targeting 1 shard using Index2 and the app_id prefix, hence my assumption that Index1 makes more sense, but seems it doesn't, due to numbers in the explains. – SpiXel Mar 08 '19 at 19:00
  • Okay, so I've tested a 1m sample of the data in a single instance mongodb, and it seems the planner successfully uses my intended index (Index1). So it most probably has to do with the sharded deployment. But still, I don't understand why falling back on index1, would result in the whole cluster keys [200m+] to be examined (totalKeysExamined). There's a matching index (Index1) in each shard, for all the fields being queried. so at most it should examine [4 (numShards)] * [indexMatches in eachShard] keys. – SpiXel Mar 09 '19 at 07:18
  • And now another observation! I've dropped the old index, re-created the new index with app_id being at the beginning, and nothing's changed. still the irrelevant index is picked, interesting thing being executionTime for that irrelevant index is more than 2 times higher than executionTime with Index1, but still, planner assigns a higher score to that (1.51111.. vs 1.48167...) and picks that one. – SpiXel Mar 09 '19 at 09:26
2

Answering my own question here,

MongoDB's query planner scores seem to be re-adjusted now and they now reflect a higher value for the Index that's matching all find predicates.

So basically, it needed several hours of time to figure out Index1: {"created": 1, "latest_transaction": -1, "uninstalled": -1, "app_id": 1} should have a higher score than the other index, while I expected the change in behavior to be instantaneously.

The score assigned and current evaluations of the planner also can be accessed in Mongodb, following commands helped me figure out the scores and how they are progressing through time.

var queryShape = db.installation.getPlanCache().listQueryShapes()[IDX]
db.installation.getPlanCache().getPlansByQuery(queryShape)
SpiXel
  • 4,338
  • 1
  • 29
  • 45