24

If I run a mongo query from the shell with explain(), get the name of the index used and then run the same query again, but with hint() specifying the same index to be used - "millis" field from explain plan is decreased significantly

for example

no hint provided:

>>db.event.find({ "type" : "X", "active" : true, "timestamp" : { "$gte" : NumberLong("1317498259000") }, "count" : { "$gte" : 0 } }).limit(3).sort({"timestamp" : -1 }).explain();

{
    "cursor" : "BtreeCursor my_super_index",
    "nscanned" : 599,
    "nscannedObjects" : 587,
    "n" : 3,
    "millis" : 24,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : true,
    "indexOnly" : false,
    "indexBounds" : { ... }
} 

hint provided:

>>db.event.find({ "type" : "X", "active" : true, "timestamp" : { "$gte" : NumberLong("1317498259000") }, "count" : { "$gte" : 0 } }).limit(3).sort({"timestamp" : -1 }).hint("my_super_index").explain();

{
    "cursor" : "BtreeCursor my_super_index",
    "nscanned" : 599,
    "nscannedObjects" : 587,
    "n" : 3,
    "millis" : 2,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : true,
    "indexOnly" : false,
    "indexBounds" : { ... }
} 

The only difference is "millis" field

Does anyone know why is that?

UPDATE: "Selecting which index to use" doesn't explain it, because mongo, as far as I know, selects index for each X (100?) runs, so it should be as fast as with hint next (X-1) runs

Eugene Platonov
  • 3,145
  • 3
  • 26
  • 20
  • 4
    Either the index is speeding things up that much, or you're just re-executing a cached query, which has much much much lower overhead. – Marc B Oct 11 '11 at 18:24
  • I don't think it's because of caching. If I run the same query without hint 2, 3 or 10 times it will not be much faster, but it always speed up significantly with the hint. – Eugene Platonov Oct 11 '11 at 19:28
  • Could you edit your question to include the output of find(...).explain(true) without the hint. That will print extra info that may help debug this. – mstearn Oct 11 '11 at 21:24
  • 2
    Maybe it takes 22ms to determine what index to use? – wberry Mar 27 '12 at 21:27
  • @wberry answer is appropriate here, explain() with hint() will return rejectedPlans[] as empty since winningPlan was already provided using hint() method. Delta will be more significant for collection with a higher number of indexes. ```"rejectedPlans" : [ { "stage" : "SKIP", "skipAmount" : 493, "inputStage" : {``` vs ```"rejectedPlans" : []``` – Sindhu Nov 09 '20 at 21:09

4 Answers4

28

Mongo uses an algorithm to determine which index to be used when no hint is provided and then caches the index used for the similar query for next 1000 calls

But whenever you explain a mongo query it will always run the index selection algorithm, thus the explain() with hint will always take less time when compared with explain() without hint.

Similar question was answered here Understanding mongo db explain

Community
  • 1
  • 1
Global Warrior
  • 5,050
  • 9
  • 45
  • 75
  • Sounds reasonable. Do you have any references to documentation? – Eugene Platonov Feb 23 '13 at 08:04
  • 4
    "Additionally, the $explain operation reevaluates a set of candidate query plans, which may cause the $explain operation to perform differently than a normal query. As a result, these operations generally provide an accurate account of how MongoDB would perform the query, but do not reflect the length of these queries. When you run explain() with hint(), the query optimizer does not reevaluate the query plans." Extracted from http://docs.mongodb.org/manual/reference/operator/explain/#op._S_explain – Eugene Platonov Mar 27 '13 at 19:22
6

Mongo did the same search both times as you can see from the number of scanned objects. Also you can see that the used index was the same (take a look at the "cursor" entry), both used already your my_super_index index.

"hint" only tells Mongo to use that specific index which it already automatically did in the first query.

The second search was simple faster because all the data was probably already in the cache.

StefanMK
  • 1,313
  • 1
  • 12
  • 22
  • 1
    This is absolutely correct. Your first query is longer running because it page faulted and loaded the data from disk. In order to performance test something like this, you need to run it thousands of times under all kinds of different circumstances similar to what its production load will be and average the results in order to even approach an accurate benchmark. – marr75 Mar 29 '12 at 14:13
  • One note, however, the data is most likely not in the "cache", the database files are memory mapped and have just been loaded into system memory already. Mongo keeps things simple by memory mapping it's files and letting the operating system decide when to swap pages, almost always based on frequency and recency of access. – marr75 Mar 29 '12 at 14:16
2

I struggled finding reason for same thing. I found that when we have lots of indexes, mongo is indeed taking more time than using hint. Mongo basically is taking lot of time deciding which index to use. Think of a scenario where you have 40 indexes and you do a query. First task which Mongo needs to do is which index is best suited to be used for particular query. This would imply mongo needs to scan all the keys as well as do some computation in every scan to find some performancce index if this key is used. hint will definitely speed up since index key scan will be saved.

Parag Arora
  • 98
  • 1
  • 6
0

I will tell you how to find out how it's faster 1) without index It will pull every document into memory to get the result 2) with index If you have a lot of index for that collection it will take index from the cache memory 3) with .hint(_index) It will take that specific index which you have mention

with hint() without hint() both time you do .explain("executionStats") with hint() then you can check totalKeysExamined value that value will match with totalDocsExamined without hint() you can see totalKeysExamined value is greter then totalDocsExamined

totalDocsExamined this result will perfectly match with result count most of the time.