1

I have a collection test, and a compound index on it with two fields

db.test.createIndex({ i: 1, j: 1 })

When I execute following pipeline

db.test.aggregate([{ $sort: { i: 1, j: 1 } }], { allowDiskUse: false })

it works fine. But this pipeline

db.test.aggregate([{ $sort: { i: 1, j: -1 } }], { allowDiskUse: false })

fails with the error that says "Sort exceeded memory limit". The reason is more less clear. The sort order in the pipeline does not match the order in the index and therefore mongodb decides not to use the index and sort the whole collection, which, in turn, does not fit in memory.

However I suspect that mongodb could be slightly smarter. Instead of sorting the whole collection it could use the index to delimit blocks of documents, for which field i is the same, and then sort documents only within such blocks. The documents of the same block have more chances to fit in memory and therefore the pipeline can perform more efficiently. Can I make mongodb server do so? How? If not, what prevents this.

2 Answers2

1

It seems mongod do not identify that can use the index , but you can try to hint him as follow:

db.test.aggregate([ {$sort:{i:1,j:-1}} ],{hint:"i_1_j_1"})
R2D2
  • 9,410
  • 2
  • 12
  • 28
  • 1
    Yes. `hint` makes it to use index. However it seems that the result of index scan is resorted as a whole, not in blocks. And, as a consequence, memory limit is exceeded anyway. – Dmytro Sheyko Sep 16 '22 at 18:59
0

A similar question was asked a few days later here. As @Tom Slabbaert mentioned in the comments, the answer is that no, at the time of writing, MongoDB does not appear to support using the index in the situation described to provide an incremental sort. There is no (non-hacky) way to force the system to do this, especially in a way that would be flexible and deliver performance benefits.

Some additional things to consider with respect to the presumed goal of improved performance:

  • What's the end result of what you're trying to achieve here? Is there a particular reason that would the compound sort is necessary and/or that the index couldn't be adjusted (to have j in descending order to allow it to support the sort)?
  • The sample pipelines explicitly have allowDiskUse set to false. Is there a reason for that? Setting it to true should allow the operation to complete successfully.
  • Relatedly, allowDiskUse now defaults to true beginning in version 6.0.

Edit: Per the comments, the request for this functionality in MongoDB appears to be tracked either here or here.

user20042973
  • 4,096
  • 2
  • 3
  • 14
  • Here the `allowDiskUse` is set to `false` for demonstration purpose, just to make sure that it doesn't sort "incrementally". And yes, when `allowDiskUse` is set to `true` the pipeline completes successfully, but it works noticeably longer. – Dmytro Sheyko Sep 26 '22 at 12:05
  • As for the end result, I was trying to figure out is this feasible to give the end user means to sort the data with arbitrary order, i.e. the order that the end user is able to customize. In this particular case I could create 4 indexes: `{i:1,j:1}`, `{i:1,j:-1}`, `{j:1,i:1}`, `{j:1,i:-1}`; but this doesn't seem like a good option. This approach does not scale well when the number of sorted fields is greater. So I would leave the necessary minimun of indexes and having both `{i:1,j:1}` and `{i:1,j:-1}` seemed to me redundant. – Dmytro Sheyko Sep 26 '22 at 12:05
  • 1
    Right - the answer to the original question you asked is that there is no direct way to make MongoDB sort in a blockwise manner. Looks like such functionality is tracked in jira.mongodb.org/browse/SERVER-62405. Offering arbitrary sorting (and filtering) criteria makes it basically impossible to provide a set of indexes that would perfectly satisfy all possible combinations. But that's also generally acceptable. You can offer preferred/recommended options and warn that other combinations may be slower. Also, sorting the entire collection may not be representative of filtered result sets – user20042973 Sep 26 '22 at 19:27
  • 1
    https://jira.mongodb.org/browse/SERVER-65159 also seems relevant. Edited my answer to reference both – user20042973 Sep 26 '22 at 19:30