1

Why can't mongodb effectively sort a non-prefixed sub-index of a compound index?

https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/#sort-and-non-prefix-subset-of-an-index

for example if you index { a: 1, b: 1, c: 1 } on a collection you can sort with { a: 1, b: 1 } or { a: 1 }, these are called prefixes

find().sort({ a: 1 }).limit(10)

the same way if we put a condition of equality on the "a" keys, we can sort without problems with the "b" keys

find({ a:'example' }).sort({ b: 1 }).limit(10)

more generally if we set a condition of belonging to a finite set, (belonging to a list of elements), we will be able to retrieve each of the keys belonging to the list, and then perform a SORT_MERGE up to the selected document limit since each key "a" gives us access to an ordered index of "b" keys so these queries will be done normally using the index

find({ a: { $in: ['exa','ample','test'] }).sort({ b: 1 }).limit(10)

But on the other hand, if I try a request like that:

find().sort({ b: 1 }).limit(10)

he will make a COLSCAN (he will go through all the documents) and then he will sort through ALL the data, and finally take the last 10. The index { a: 1, b: 1, c: 1, c: 1 } could be used much more efficiently. We could scan the keys indexed in "a" and then perform a SORT_MERGE with the requested limit. In the end it makes us an algorithm in O (keys + limits) where keys is the number of different "a" keys and limits the number of documents we want to retrieve. It saves us a COLSCAN and it saves us from sorting the collected data. It is reasonable to assume that there are fewer different "a" keys than documents since the compound index should not be used. The fewer different "a" keys there are, the more the performance gain should be felt.

Here I only mentioned the case where no condition is placed on the first index, but if we add a condition like this

find({ { a: { $gt: 100 } }).sort({ b: 1 }).limit(10)
find({ a: /^spa/i }).sort({ b: 1 }).limit(10)

we could imagine exactly the same operation except that we limited the number of "a" keys to scanned.

We could gain crazy performance for some requests, even do without a few sub-indexes in some cases

Where am i wrong ?

Equitable
  • 37
  • 5
  • Does this answer your question? https://stackoverflow.com/questions/36142299/how-does-sorting-with-an-index-work-in-mongodb/36420196#36420196 – kevinadi Aug 13 '19 at 02:05
  • Regarding the comment `We could scan the keys indexed in "a" and then perform a MERGE_SORT with the requested limit.`, it _may_ work if you have discrete values in `a`. It will not work very well if you have an array, or float as `a`. At some point, you have to merge so much stuff that it doesn't work anymore without consuming a lot of resources. – kevinadi Aug 13 '19 at 02:12

0 Answers0