0

I am using Mongo 4.2 (stuck with this) and have a collection say "product_data" with documents with the following schema:

_id:"2lgy_itmep53vy"
uIdHash:"2lgys2yxouhug5xj3ms45mluxw5hsweu"
userTS:1494055844000

Case 1: With this, I have the following indexes for the collection:

  1. _id:Regular - Unique
  2. uIdHash: Hashed

I tried to execute

db.product_data.find( {"uIdHash":"2lgys2yxouhug5xj3ms45mluxw5hsweu"}).sort({"userTS":-1}).explain()

and these are the stages in result:

1

Ofcourse, I could realize that it would make sense to have an additional compound index to avoid the mongo in-memory 'Sort' stage.

Case 2: Now I have attempted to add another index with those which were existing 3. {uIdHash:1 , userTS:-1}: Regular and Compound

Up to my expectation, the result of execution here was able to optimize on the sorting stage:

2



All good so far, now that I am looking to build for pagination on top of this query. I would need to limit the data queried. Hence the query further translates to

db.product_data.find( {"uIdHash":"2lgys2yxouhug5xj3ms45mluxw5hsweu"}).sort({"userTS":-1}).limit(10).explain()

The result for each Case now are as follows:

Case 1 Limit Result: enter image description here

The in-memory sorting does less work (36 instead of 50) and returns the expected number of documents. Fair enough, a good underlying optimization in the stage.

Case 2 Limit Result: enter image description here Surprisingly, with the index in use and the data queried, there is an additional Limit stage added to processing!

The doubts now I have are as follows:

  1. Why do we need an additional stage for LIMIT, when we already have 10 documents retured from FETCH stage?
  2. What would be the impact of this additional stage? Given that I need pagination, shall I stick with Case 1 indexes and not use the last compound index?
Naman
  • 27,789
  • 26
  • 218
  • 353

2 Answers2

2

Limit stage tells you that the database is limiting the result set. This means subsequent stages will work with less data.

Your question of "why do we need an additional stage for limit" doesn't make sense. You send the query to the database, and you do not use (or need) any stages. The database decides how to fulfill the query, if you asked it to limit the result set it does that and it communicates to you that it has done that, by telling you there is a limit stage in query processing.

D. SM
  • 13,584
  • 3
  • 12
  • 21
  • the question is based upon why does the database perform the redundant effort in the limit. do notice the returned values of the scan and the fetch stage, they have already provided only 10 documents while there were a total of 24 documents that could have matched. Just to contradict "it communicates to you that it has done that, by telling you there is a limit stage", that should be the behavior with the single hashed index(case 1) as well then, right? (I don't believe its just trying to convey to me that I had called LIMIT.) – Naman Jan 25 '21 at 18:09
  • 1
    Limit stage is the thing that produces the 10 documents, hence it is not "redundant". What would it be redundant with? – D. SM Jan 25 '21 at 20:54
  • Do notice the `nReturned` value from `FETCH` and `IXSCAN` stages. The documents returned are already 10 from within. – Naman Jan 26 '21 at 03:23
  • 1
    Fetch and index scan will return up to the entire collection. It's the limit stage that stops them at 10 documents. – D. SM Jan 26 '21 at 05:21
  • Not really sure I understood the previous comment. Could you explain "return up to the entire collection."? – Naman Jan 26 '21 at 08:22
  • I had posted this prior to this on the community, where [this answer](https://developer.mongodb.com/community/forums/t/unwanted-limit-stage-in-querying-data/14863/10?u=naman) helped me understand the details. The mention of the cursor and its relevance justifies the use of `limit` stage. – Naman Feb 03 '21 at 04:23
1

The query executor is able to perform some optimizations. One of these is that when there is a limit and no blocking stage (like a sort), when the limit is reached, all of the upstream stages can stop early.

This means that if there were no limit stage, the ixscan and fetch stages would have continued through all 24 matching documents.

There is no discreet limit stage with the non-index sort because it is combined with the sort stage.

Joe
  • 25,000
  • 3
  • 22
  • 44
  • should I infer from this that ixscan and fetch by themself are non-terminating operations and they still worked with and returned 24 documents, just that the addition of limit stage makes it display 10? – Naman Jan 26 '21 at 10:12
  • 1
    No, that's one of the benefits of using the index for selection and sorting, both stages can stop examining keys and documents early. – Joe Jan 26 '21 at 17:00