1

I have an orders collection with the following index, among others:

{location: 1, completedDate: 1, estimatedProductionDate: 1, estimatedCompletionDate: 1}

I'm performing the following query:

db.orders.find({
  status: {$in: [1, 2, 3]},
  location: "PA",
  $or: [
    {completedDate: {$lt: ISODate("2017-08-22T04:59:59.999Z")}},
    {
      completedDate: null,
      estimatedProductionDate: {$lt: ISODate("2017-08-22T04:59:59.999Z")}
    }
  ]
}).explain()

I was hoping this would perform an efficient IXSCAN for each branch of the $or, and then combine the results:

        {completedDate: {$lt: ISODate("2017-08-22T04:59:59.999Z")}}

        "indexBounds" : {
            "location" : [
                "[\"TX\", \"TX\"]"
            ],
            "completedDate" : [
                "[MinKey, ISODate("2017-08-22T04:59:59.999Z")]"
            ],
            "estimatedProductionDate" : [
                "[MinKey, MaxKey]"
            ],
            "estimatedCompletionDate" : [
                "[MinKey, MaxKey]"
            ]
        }

        {
            completedDate: null,
            estimatedProductionDate: {$lt: ISODate("2017-08-22T04:59:59.999Z")}
        }

        "indexBounds" : {
            "location" : [
                "[\"TX\", \"TX\"]"
            ],
            "completedDate" : [
                "[null, null]"
            ],
            "estimatedProductionDate" : [
                "[MinKey, ISODate("2017-08-22T04:59:59.999Z")]"
            ],
            "estimatedCompletionDate" : [
                "[MinKey, MaxKey]"
            ]
        }

Instead, it only bounds the location in the IXSCAN, and does the rest of the filtering during FETCH. Is there any way to optimize this query without splitting it into two separate queries?

"winningPlan" : {
    "stage" : "FETCH",
    "filter" : {
        "$and" : [
            {
                "$or" : [
                    {
                        "$and" : [
                            {
                                "completedDate" : {
                                    "$eq" : null
                                }
                            },
                            {
                                "estimatedProductionDate" : {
                                    "$lt" : "2017-08-22T04:59:59.999Z"
                                }
                            }
                        ]
                    },
                    {
                        "completedDate" : {
                            "$lt" : "2017-08-22T04:59:59.999Z"
                        }
                    }
                ]
            },
            {
                "status" : {
                    "$in" : [
                        1,
                        2,
                        3
                    ]
                }
            }
        ]
    },
    "inputStage" : {
        "stage" : "IXSCAN",
        "keyPattern" : {
            "location" : 1,
            "completedDate" : 1,
            "estimatedProductionDate" : 1,
            "estimatedCompletionDate" : 1
        },
        "indexName" : "location_1_completedDate_1_estimatedProductionDate_1_estimatedCompletionDate_1",
        "isMultiKey" : false,
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : false,
        "indexVersion" : 1,
        "direction" : "forward",
        "indexBounds" : {
            "location" : [
                "[\"TX\", \"TX\"]"
            ],
            "completedDate" : [
                "[MinKey, MaxKey]"
            ],
            "estimatedProductionDate" : [
                "[MinKey, MaxKey]"
            ],
            "estimatedCompletionDate" : [
                "[MinKey, MaxKey]"
            ]
        }
    }
},
Andy
  • 7,885
  • 5
  • 55
  • 61

1 Answers1

2

There are three issues that are immediately apparent:

Your index

I'm not sure about the other indexes you have, but your query is of the shape:

{
  status:1,
  location:1,
  $or: [
    {completedDate:1},
    {completedDate:1, estimatedProductionDate:1}
  ]
}

However your index does not contain the term status. You would need the status field in your index to maximize index use.

Your $or query

To paraphrase the page $or Clauses and Indexes:

... for MongoDB to use indexes to evaluate an $or expression, all the clauses in the $or expression must be supported by indexes. Otherwise, MongoDB will perform a collection scan.

To put it simply, efficient $or queries in MongoDB would require the $or term to be the top-level term, with each part of the term supported by an index.

For example, you may find the performance of the following index and query to be a bit better:

db.orders.createIndex({
  status:1,
  location:1,
  completedDate:1,
  estimatedProductionDate:1
})

db.orders.explain().find({
  $or: [
    {
      status: {$in: [1, 2, 3]},
      location: "PA",
      completedDate: {$lt: ISODate("2017-08-22T04:59:59.999Z")}},
    {
      status: {$in: [1, 2, 3]},
      location: "PA",
      completedDate: null,
      estimatedProductionDate: {$lt: ISODate("2017-08-22T04:59:59.999Z")}
    }
  ]
})

The reason is because MongoDB treats each of the term in an $or query to be a separate query. Thus, each term can use its own index.

Note that the order of fields in the index I proposed above follows the order of the fields in the query.

However, this is still not optimal, because MongoDB has to perform a fetch with filter: {completedDate: {$eq: null}} after the index scan for a query with completedDate: null. The reason for this is subtle and best explained here:

  1. The document {} generates the index key {"": null} for the index with key pattern {"a.b": 1}.
  2. The document {a: []} also generates the index key {"": null} for the index with key pattern {"a.b": 1}.
  3. The document {} matches the query {"a.b": null}.
  4. The document {a: []} does not match the query {"a.b": null}.

Therefore, a query {"a.b": null} that is answered by an index with key pattern {"a.b": 1} must fetch the document and re-check the predicate, in order to ensure that the document {} is included in the result set and that the document {a: []} is not included in the result set.

To maximize index use, you may be better off just to assign something into the completedDate field instead of setting it to null.

Andy
  • 7,885
  • 5
  • 55
  • 61
kevinadi
  • 13,365
  • 3
  • 33
  • 49
  • Thanks for the detailed answer! I kept `status` out of my index because it would interfere with sorting by the dates via the index. It only has 4 possible values so I'm not concerned about the overhead of a `FETCH` stage to filter out undesired `status`es. – Andy Aug 22 '17 at 03:55
  • I'm not sure you're correct about indexes being unable to store null because I have read that unique indexes can store an entry for null, and also do so for documents that don't contain the field in question: https://docs.mongodb.com/manual/core/index-unique/ – Andy Aug 22 '17 at 04:01
  • I suppose I could use one index including `status` for this query and one without it for the query that needs to be sorted though. – Andy Aug 22 '17 at 04:04
  • Oops you're right, `null` do get indexed. Updated the answer to reflect that fact. – kevinadi Aug 22 '17 at 05:05
  • Cool. I don't need to tell the difference between `null` value vs missing field, so the index should efficiently satisfy my query including `completedDate: null`, right? – Andy Aug 22 '17 at 17:26
  • I think there's still a mistake in your answer; querying for `completedDate: null` matches both documents that have that value, and that don't have the field at all. Both are stored the as null in the index so Mongo will get all matching documents from the index, without performing any fetch. – Andy Aug 22 '17 at 18:11
  • Wait what the heck. When I explain the query with all necessary index fields in the or clauses, it does a FETCH to check that `completedDate: {eq: null}` after doing an IXSCAN with `completedDate: ["[null,null]"]`. Why would it need to do the fetch if the query means get all documents that have `null` for that field or don't have that field at all? – Andy Aug 22 '17 at 18:29
  • Though I guess that's not a big problem because it would FETCH anyway, even if it didn't perform that unnecessary filtering. – Andy Aug 22 '17 at 18:32
  • One last note: when i changed the order of the fields in the query, it didn't affect the query plan. So I think the order only matters in the index definition. – Andy Aug 22 '17 at 18:34
  • @Andy there's no mistake in the answer. This part: `This is because null values are indexed, but there is no information in the index whether the related document actually has null as the value, or is missing the field altogether.` explains what you were saying in your comments. – kevinadi Aug 22 '17 at 22:13
  • @Andy the FETCH is because this document: `{a:null, b:1}` and this: `{b:1}` with an index of `{a:1}` will both be indexed as `{a:null}`. **MongoDB cannot know which case it is from the index alone, since the index entry for both documents are identical**, so it would have to fetch the document to find out. – kevinadi Aug 22 '17 at 22:24
  • @Andy order of the fields in the query makes no difference to the query plan because `{a:1, b:1}` is identical to `{b:1, a:1}`, but this is off-topic for the question. – kevinadi Aug 22 '17 at 22:26
  • @Andy I put the example re: `a==null` index entries and why this makes it less efficient in the answer for future readers. – kevinadi Aug 22 '17 at 22:32
  • but I don't understand why Mongo needs to know which case it is, because the query `{a:null}` matches both `{a:null, b:1}` and `{b:1}`, so an index scan should find all the matching documents. I understand why FETCH is needed if the query is `{a: {$type: 10}}` or `{a: {$exists: false}}`, since those match only one of the two cases, but not if the query is just `{a: null}`, which matches both cases. – Andy Aug 23 '17 at 01:47
  • @Andy I think the discussion regarding how MongoDB treats `null` values should be moved to a new question. I have answered the original question, which is `Is there any way to optimize this query without splitting it into two separate queries` and give suggestion to not put `null` values there. Please create a **new question** regarding `null` values, and I'll be happy to explain further. – kevinadi Aug 23 '17 at 03:30
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/152585/discussion-between-andy-and-kevin-adistambha). – Andy Aug 23 '17 at 03:34