17

I have Mongodb collection with about 7 million documents that represents places.

I run a query that search for places that their name start with a prefix near a specific location.

We have a compound index as described bellow to speed up the search.

When the search query find match (even if only one) the query is execute very fast (~20 milisec). But when there is no match it can take 30 sec for the query to execute.

Please assist.

In detailed:

Each place (geoData) has the following fields:

"loc" - a GeoJSON point that represent the location
"categoriesIds" - array of int ids
"name" - the name of the placee

The following index is defined on this collection:

{
  "loc" : "2dsphere",
  "categoriesIds" : 1,
  "name" : 1
}

The query is:

db.geoData.find({
  "loc":{
    "$near":{
      "$geometry":{
        "type": "Point" ,
        "coordinates": [ -0.10675191879272461 , 51.531600743186644]
      },
      "$maxDistance": 5000.0
    }
  }, 
  "categoriesIds":{
    "$in": [ 1 , 2 , 71 , 70 , 74 , 72 , 73 , 69 , 44 , 26 , 27 , 33 , 43 , 45 , 53 , 79]
  }, 
  "name":{ "$regex": "^Cafe Ne"}
})

Execution stats (Link to the whole explain result)

    "executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 1,
    "executionTimeMillis" : 169,
    "totalKeysExamined" : 14333,
    "totalDocsExamined" : 1,
    "executionStages" : {
        "stage" : "GEO_NEAR_2DSPHERE",
        "nReturned" : 1,
        "executionTimeMillisEstimate" : 60,
        "works" : 14354,
        "advanced" : 1,
        "needTime" : 14351,
        "needFetch" : 0,
        "saveState" : 361,
        "restoreState" : 361,
        "isEOF" : 1,
        "invalidates" : 0,
        "keyPattern" : {
            "loc" : "2dsphere",
            "categoriesIds" : 1,
            "name" : 1
        },
        "indexName" : "loc_2dsphere_categoriesIds_1_name_1",
        "searchIntervals" : [ 
            {
                "minDistance" : 0,
                "maxDistance" : 3408.329295346151,
                "maxInclusive" : false
            }, 
            {
                "minDistance" : 3408.329295346151,
                "maxDistance" : 5000,
                "maxInclusive" : true
            }
        ],
        "inputStages" : [ 
            {
                "stage" : "FETCH",
                "nReturned" : 1,
                "executionTimeMillisEstimate" : 20,
                "works" : 6413,
                "advanced" : 1,
                "needTime" : 6411,
                "needFetch" : 0,
                "saveState" : 361,
                "restoreState" : 361,
                "isEOF" : 1,
                "invalidates" : 0,
                "docsExamined" : 1,
                "alreadyHasObj" : 0,
                "inputStage" : {
                    "stage" : "IXSCAN",
                    "filter" : {
                        "TwoDSphereKeyInRegionExpression" : true
                    },
                    "nReturned" : 1,
                    "executionTimeMillisEstimate" : 20,
                    "works" : 6413,
                    "advanced" : 1,
                    "needTime" : 6411,
                    "needFetch" : 0,
                    "saveState" : 361,
                    "restoreState" : 361,
                    "isEOF" : 1,
                    "invalidates" : 0,
                    "keyPattern" : {
                        "loc" : "2dsphere",
                        "categoriesIds" : 1,
                        "name" : 1
                    },
                    "indexName" : "loc_2dsphere_categoriesIds_1_name_1",
                    "isMultiKey" : true,
                    "direction" : "forward",
                    "indexBounds" : {
                        "loc" : [ 
                            "[\"2f1003230\", \"2f1003230\"]", 
                            "[\"2f10032300\", \"2f10032300\"]", 
                            "[\"2f100323000\", \"2f100323000\"]", 
                            "[\"2f1003230001\", \"2f1003230001\"]", 
                            "[\"2f10032300012\", \"2f10032300013\")", 
                            "[\"2f1003230002\", \"2f1003230002\"]", 
                            "[\"2f10032300021\", \"2f10032300022\")", 
                            "[\"2f10032300022\", \"2f10032300023\")", 
                            "[\"2f100323003\", \"2f100323003\"]", 
                            "[\"2f1003230031\", \"2f1003230031\"]", 
                            "[\"2f10032300311\", \"2f10032300312\")", 
                            "[\"2f10032300312\", \"2f10032300313\")", 
                            "[\"2f10032300313\", \"2f10032300314\")", 
                            "[\"2f1003230032\", \"2f1003230032\"]", 
                            "[\"2f10032300320\", \"2f10032300321\")", 
                            "[\"2f10032300321\", \"2f10032300322\")"
                        ],
                        "categoriesIds" : [ 
                            "[1.0, 1.0]", 
                            "[2.0, 2.0]", 
                            "[26.0, 26.0]", 
                            "[27.0, 27.0]", 
                            "[33.0, 33.0]", 
                            "[43.0, 43.0]", 
                            "[44.0, 44.0]", 
                            "[45.0, 45.0]", 
                            "[53.0, 53.0]", 
                            "[69.0, 69.0]", 
                            "[70.0, 70.0]", 
                            "[71.0, 71.0]", 
                            "[72.0, 72.0]", 
                            "[73.0, 73.0]", 
                            "[74.0, 74.0]", 
                            "[79.0, 79.0]"
                        ],
                        "name" : [ 
                            "[\"Cafe Ne\", \"Cafe Nf\")", 
                            "[/^Cafe Ne/, /^Cafe Ne/]"
                        ]
                    },
                    "keysExamined" : 6412,
                    "dupsTested" : 0,
                    "dupsDropped" : 0,
                    "seenInvalidated" : 0,
                    "matchTested" : 1
                }
            }, 
            {
                "stage" : "FETCH",
                "nReturned" : 0,
                "executionTimeMillisEstimate" : 40,
                "works" : 7922,
                "advanced" : 0,
                "needTime" : 7921,
                "needFetch" : 0,
                "saveState" : 261,
                "restoreState" : 261,
                "isEOF" : 1,
                "invalidates" : 0,
                "docsExamined" : 0,
                "alreadyHasObj" : 0,
                "inputStage" : {
                    "stage" : "IXSCAN",
                    "filter" : {
                        "TwoDSphereKeyInRegionExpression" : true
                    },
                    "nReturned" : 0,
                    "executionTimeMillisEstimate" : 40,
                    "works" : 7922,
                    "advanced" : 0,
                    "needTime" : 7921,
                    "needFetch" : 0,
                    "saveState" : 261,
                    "restoreState" : 261,
                    "isEOF" : 1,
                    "invalidates" : 0,
                    "keyPattern" : {
                        "loc" : "2dsphere",
                        "categoriesIds" : 1,
                        "name" : 1
                    },
                    "indexName" : "loc_2dsphere_categoriesIds_1_name_1",
                    "isMultiKey" : true,
                    "direction" : "forward",
                    "indexBounds" : {
                        "loc" : [ 
                            "[\"2f1003230\", \"2f1003230\"]", 
                            "[\"2f10032300\", \"2f10032300\"]", 
                            "[\"2f100323000\", \"2f100323000\"]", 
                            "[\"2f1003230001\", \"2f1003230001\"]", 
                            "[\"2f10032300011\", \"2f10032300012\")", 
                            "[\"2f10032300012\", \"2f10032300013\")", 
                            "[\"2f1003230002\", \"2f1003230002\"]", 
                            "[\"2f10032300021\", \"2f10032300022\")", 
                            "[\"2f10032300022\", \"2f10032300023\")", 
                            "[\"2f100323003\", \"2f100323003\"]", 
                            "[\"2f1003230031\", \"2f1003230032\")", 
                            "[\"2f1003230032\", \"2f1003230032\"]", 
                            "[\"2f10032300320\", \"2f10032300321\")", 
                            "[\"2f10032300321\", \"2f10032300322\")", 
                            "[\"2f10032300322\", \"2f10032300323\")"
                        ],
                        "categoriesIds" : [ 
                            "[1.0, 1.0]", 
                            "[2.0, 2.0]", 
                            "[26.0, 26.0]", 
                            "[27.0, 27.0]", 
                            "[33.0, 33.0]", 
                            "[43.0, 43.0]", 
                            "[44.0, 44.0]", 
                            "[45.0, 45.0]", 
                            "[53.0, 53.0]", 
                            "[69.0, 69.0]", 
                            "[70.0, 70.0]", 
                            "[71.0, 71.0]", 
                            "[72.0, 72.0]", 
                            "[73.0, 73.0]", 
                            "[74.0, 74.0]", 
                            "[79.0, 79.0]"
                        ],
                        "name" : [ 
                            "[\"Cafe Ne\", \"Cafe Nf\")", 
                            "[/^Cafe Ne/, /^Cafe Ne/]"
                        ]
                    },
                    "keysExamined" : 7921,
                    "dupsTested" : 0,
                    "dupsDropped" : 0,
                    "seenInvalidated" : 0,
                    "matchTested" : 0
                }
            }
        ]
    },

Execution stats when searching for "CafeNeeNNN" instead of "Cafe Ne" (Link to the whole explain result )

 "executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 0,
    "executionTimeMillis" : 2537,
    "totalKeysExamined" : 232259,
    "totalDocsExamined" : 162658,
    "executionStages" : {
        "stage" : "FETCH",
        "filter" : {
            "$and" : [ 
                {
                    "name" : /^CafeNeeNNN/
                }, 
                {
                    "categoriesIds" : {
                        "$in" : [ 
                            1, 
                            2, 
                            26, 
                            27, 
                            33, 
                            43, 
                            44, 
                            45, 
                            53, 
                            69, 
                            70, 
                            71, 
                            72, 
                            73, 
                            74, 
                            79
                        ]
                    }
                }
            ]
        },
        "nReturned" : 0,
        "executionTimeMillisEstimate" : 1330,
        "works" : 302752,
        "advanced" : 0,
        "needTime" : 302750,
        "needFetch" : 0,
        "saveState" : 4731,
        "restoreState" : 4731,
        "isEOF" : 1,
        "invalidates" : 0,
        "docsExamined" : 70486,
        "alreadyHasObj" : 70486,
        "inputStage" : {
            "stage" : "GEO_NEAR_2DSPHERE",
            "nReturned" : 70486,
            "executionTimeMillisEstimate" : 1290,
            "works" : 302751,
            "advanced" : 70486,
            "needTime" : 232264,
            "needFetch" : 0,
            "saveState" : 4731,
            "restoreState" : 4731,
            "isEOF" : 1,
            "invalidates" : 0,
            "keyPattern" : {
                "loc" : "2dsphere"
            },
            "indexName" : "loc_2dsphere",
            "searchIntervals" : [ 
                {
                    "minDistance" : 0,
                    "maxDistance" : 3408.329295346151,
                    "maxInclusive" : false
                }, 
                {
                    "minDistance" : 3408.329295346151,
                    "maxDistance" : 5000,
                    "maxInclusive" : true
                }
            ],
            "inputStages" : [ 
                {
                    "stage" : "FETCH",
                    "nReturned" : 44540,
                    "executionTimeMillisEstimate" : 110,
                    "works" : 102690,
                    "advanced" : 44540,
                    "needTime" : 58149,
                    "needFetch" : 0,
                    "saveState" : 4731,
                    "restoreState" : 4731,
                    "isEOF" : 1,
                    "invalidates" : 0,
                    "docsExamined" : 44540,
                    "alreadyHasObj" : 0,
                    "inputStage" : {
                        "stage" : "IXSCAN",
                        "filter" : {
                            "TwoDSphereKeyInRegionExpression" : true
                        },
                        "nReturned" : 44540,
                        "executionTimeMillisEstimate" : 90,
                        "works" : 102690,
                        "advanced" : 44540,
                        "needTime" : 58149,
                        "needFetch" : 0,
                        "saveState" : 4731,
                        "restoreState" : 4731,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "keyPattern" : {
                            "loc" : "2dsphere"
                        },
                        "indexName" : "loc_2dsphere",
                        "isMultiKey" : false,
                        "direction" : "forward",
                        "indexBounds" : {
                            "loc" : [ 
                                "[\"2f1003230\", \"2f1003230\"]", 
                                "[\"2f10032300\", \"2f10032300\"]", 
                                "[\"2f100323000\", \"2f100323000\"]", 
                                "[\"2f1003230001\", \"2f1003230001\"]", 
                                "[\"2f10032300012\", \"2f10032300013\")", 
                                "[\"2f1003230002\", \"2f1003230002\"]", 
                                "[\"2f10032300021\", \"2f10032300022\")", 
                                "[\"2f10032300022\", \"2f10032300023\")", 
                                "[\"2f100323003\", \"2f100323003\"]", 
                                "[\"2f1003230031\", \"2f1003230031\"]", 
                                "[\"2f10032300311\", \"2f10032300312\")", 
                                "[\"2f10032300312\", \"2f10032300313\")", 
                                "[\"2f10032300313\", \"2f10032300314\")", 
                                "[\"2f1003230032\", \"2f1003230032\"]", 
                                "[\"2f10032300320\", \"2f10032300321\")", 
                                "[\"2f10032300321\", \"2f10032300322\")"
                            ]
                        },
                        "keysExamined" : 102689,
                        "dupsTested" : 0,
                        "dupsDropped" : 0,
                        "seenInvalidated" : 0,
                        "matchTested" : 44540
                    }
                }, 
                {
                    "stage" : "FETCH",
                    "nReturned" : 47632,
                    "executionTimeMillisEstimate" : 250,
                    "works" : 129571,
                    "advanced" : 47632,
                    "needTime" : 81938,
                    "needFetch" : 0,
                    "saveState" : 2556,
                    "restoreState" : 2556,
                    "isEOF" : 1,
                    "invalidates" : 0,
                    "docsExamined" : 47632,
                    "alreadyHasObj" : 0,
                    "inputStage" : {
                        "stage" : "IXSCAN",
                        "filter" : {
                            "TwoDSphereKeyInRegionExpression" : true
                        },
                        "nReturned" : 47632,
                        "executionTimeMillisEstimate" : 230,
                        "works" : 129571,
                        "advanced" : 47632,
                        "needTime" : 81938,
                        "needFetch" : 0,
                        "saveState" : 2556,
                        "restoreState" : 2556,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "keyPattern" : {
                            "loc" : "2dsphere"
                        },
                        "indexName" : "loc_2dsphere",
                        "isMultiKey" : false,
                        "direction" : "forward",
                        "indexBounds" : {
                            "loc" : [ 
                                "[\"2f1003230\", \"2f1003230\"]", 
                                "[\"2f10032300\", \"2f10032300\"]", 
                                "[\"2f100323000\", \"2f100323000\"]", 
                                "[\"2f1003230001\", \"2f1003230001\"]", 
                                "[\"2f10032300011\", \"2f10032300012\")", 
                                "[\"2f10032300012\", \"2f10032300013\")", 
                                "[\"2f1003230002\", \"2f1003230002\"]", 
                                "[\"2f10032300021\", \"2f10032300022\")", 
                                "[\"2f10032300022\", \"2f10032300023\")", 
                                "[\"2f100323003\", \"2f100323003\"]", 
                                "[\"2f1003230031\", \"2f1003230032\")", 
                                "[\"2f1003230032\", \"2f1003230032\"]", 
                                "[\"2f10032300320\", \"2f10032300321\")", 
                                "[\"2f10032300321\", \"2f10032300322\")", 
                                "[\"2f10032300322\", \"2f10032300323\")"
                            ]
                        },
                        "keysExamined" : 129570,
                        "dupsTested" : 0,
                        "dupsDropped" : 0,
                        "seenInvalidated" : 0,
                        "matchTested" : 47632
                    }
                }
            ]
        }
    },

Indexes on the collection

{
"0" : {
    "v" : 1,
    "key" : {
        "_id" : 1
    },
    "name" : "_id_",
    "ns" : "wego.geoData"
},
"1" : {
    "v" : 1,
    "key" : {
        "srcId" : 1
    },
    "name" : "srcId_1",
    "ns" : "wego.geoData"
},
"2" : {
    "v" : 1,
    "key" : {
        "loc" : "2dsphere"
    },
    "name" : "loc_2dsphere",
    "ns" : "wego.geoData",
    "2dsphereIndexVersion" : 2
},
"3" : {
    "v" : 1,
    "key" : {
        "name" : 1
    },
    "name" : "name_1",
    "ns" : "wego.geoData"
},
"4" : {
    "v" : 1,
    "key" : {
        "loc" : "2dsphere",
        "categoriesIds" : 1,
        "name" : 1
    },
    "name" : "loc_2dsphere_categoriesIds_1_name_1",
    "ns" : "wego.geoData",
    "2dsphereIndexVersion" : 2
},
"5" : {
    "v" : 1,
    "key" : {
        "loc" : "2dsphere",
        "categoriesIds" : 1,
        "keywords" : 1
    },
    "name" : "loc_2dsphere_categoriesIds_1_keywords_1",
    "ns" : "wego.geoData",
    "2dsphereIndexVersion" : 2
}
}

Collection stats link

Eliezer
  • 171
  • 7
  • can you also post the "queryPlanner" section of the explain() for both queries? – joao Nov 26 '15 at 13:03
  • I added links to the whole "explain" results – Eliezer Nov 26 '15 at 17:18
  • According to the new full explain files both queries take the same amount of time (around 1800ms). Can you post all indexes you have for that collection (there are different indexes being used in the executionStats you posted but not in the files you linked)? – joao Nov 26 '15 at 17:38
  • You can always try to force a specific index usage with "hint" (if indexFilterSet is false, like in your case, otherwise the hint would be ignored). Can you check if the execution time is consistent with this? – joao Nov 26 '15 at 17:50
  • Sorry, I set a link to the wrong "explain" output for the "Cafe Ne" query. Now it is correct. – Eliezer Nov 26 '15 at 22:17
  • The loc_2dsphere_categoriesIds_1_keywords_1 index is { "loc" : "2dsphere", "categoriesIds" : 1, "keywords" : 1}, keywords is string array field. – Eliezer Nov 26 '15 at 22:17
  • I tried using hint on “loc_2dsphere_categoriesIds_1_name_1” but the performance worse – Eliezer Nov 26 '15 at 22:17
  • It seems like if mongo find a document that match the name on the index it returns immediately, but if it doesn’t, it search the whole documents in the index that match the location or the location and the categoriesIds. – Eliezer Nov 26 '15 at 22:17
  • A only workaround I found, and this is an ugly one :-), is to add a document to the collection that match the location categoriesIds and name before every find query I make and to filter it out in the query, later there is a cleaner that delete the junk – Eliezer Nov 26 '15 at 22:18
  • Maybe you could try separating the index into 3 different ones to give the query planner more options (intersection is still possible) and see how it goes. – joao Nov 27 '15 at 16:00
  • Can you please post the output of `db.collection.getIndices()`? I think I saw something... – Markus W Mahlberg Nov 28 '15 at 10:26
  • I added the list of indexes on the collection – Eliezer Nov 29 '15 at 07:42
  • Try removing the `loc_2dsphere_categoriesIds_1_name_1` index and replacing it with a compound index on `loc` and `name` instead. I would think that would provide enough selectivity to make the index effective without the multi-key bloat of including `categoriesIds`. – JohnnyHK Nov 30 '15 at 02:12
  • Did you try this with different indexes, as suggested? – Laizer Dec 01 '15 at 06:42
  • Yes I tried using loc_2dsphere_name_1 index, the results are similar. When mongo finds an item it return fast, but when there is no match it takes a lot of time to response. I also hint mongo to use this index but it did not help – Eliezer Dec 01 '15 at 16:36

4 Answers4

10

I am going to speculate here a bit, and then a comment about your design.

First, when you create an index on key which has an array on a value you create a record for each element of the array:

To index a field that holds an array value, MongoDB creates an index key for each element in the array.

This is from MongoDB own documentation about indecies.

So, if your typical record more than a hand full of categories and you have 7 million records, your index is huge, and it will also take time to scan the index itself to find that the index does not contain what you are looking for. It is still faster than a collection scan, but it darn slow compared to how fast it takes to find an existing record.

Now, let me comment about your schema design. This is a matter of style so feel free to ignore this part.

You have a record which might be in 17 categories. That is a bit overwhelming, and over abusing the term category. A category is a specific division, a way to quickly associate a thing with a group of things. What is a thing that belong to so many groups? Let's take for example your records Cafe Ne. I assume in the real world - and please remember, programming and applications are at best when the solve real world problems - Cafe Ne, is either a restaurant, a caffe, a jazz bar, a dinner. It's for sure not a garage (unless, cafe means cars in a language I don't know). I can hardly imagine it's a bank or a dental clinic. I'd have to really make an effort, to find more than 10 meaningful categories, that users search a cafe by.

My point is, even though mongodb allows you to design things like that, it does not mean you have to. Try to narrow the number of categories you have and the ones you look for, and you will have much better performance.

Community
  • 1
  • 1
oz123
  • 27,559
  • 27
  • 125
  • 187
  • 1
    The index size is not the problem. For existing documents the response returns very fast. The problem is that when it does not find the document by index it scan tens of thousands documents instead of give up and return no results. – Eliezer Nov 30 '15 at 10:34
  • Regarding the number of categories per document, on average there is about 1.1 categories per document, so it does not bloat the index that much. The “$in” condition is true when a document has at least one category of the list in the query – Eliezer Nov 30 '15 at 10:34
  • @Eliezer, if your search is vague, it takes time to search the index too. Can you post the sizes of your index? – oz123 Nov 30 '15 at 10:38
  • I added at the end of the question a link to the geoData.stats() output. The "loc_2dsphere_categoriesIds_1_name_1" index is about 2.5 times bigger then the "_id_" index – Eliezer Nov 30 '15 at 11:08
  • 1
    Actually, I did some testing. With documents generated with [this mgenerate template](https://gist.github.com/mwmahlberg/dcc878b18c3c825bed0d), the average size of an index entry is 3732 bytes, adding up to a whopping 26.1GB index size. I think that solves this subquestion. Expanding on @Oz123 I'd guess that the working set (aka "existing" documents) might hold most documents to be returned, and the cursor (which loads the docs lazily, iirc) simply reads the remaining ones in the background. With no match, no cursor can return after the first documents are found, so we have some disk operation – Markus W Mahlberg Dec 01 '15 at 12:13
  • @MarkusWMahlberg Looks that your comment can be an answer. – Tarlog Dec 02 '15 at 18:41
2

As JohnnyHK suggested in comments, and Oz123 pointed to in his answer, the issue here appears to be an index that has grown so large that it fails to perform well as an index. I believe that in addition to the category expansion issue that has already been pointed out, the ordering of fields in your index creates trouble. Compound indexes are built according to the order of fields, and putting name after categoriesIds makes it more costly to query on name.

It's clear that you need to tune your indexes. Exactly how you tune them depends on the types of queries that you are expecting to support. In particularly, I'm not sure if you'll see better performance from a compound index of loc and name or if you'll see better performance from individual indexes, one for loc and one for name. Mongo themselves are a little vague on when it's best to use a compound index and when it's best to use individual indexes and rely on index intersection.

My intuition says that individual indexes will perform better, but I'd test both scenarios.

If you anticipate needing to query by category as well, without name or loc fields that could narrow the query down, it's probably best to create a separate categoriesIds index.

Laizer
  • 5,932
  • 7
  • 46
  • 73
1

The order of the fields in a compound index is very important. It's hard to diagnose without having access to the real data and usage patterns, but this key might increase the odds of matching (or not) the document using only the index:

{
  "loc" : "2dsphere",
  "name" : 1,
  "categoriesIds" : 1
}
jbmartinez
  • 624
  • 5
  • 8
0

Not sure if it is exactly the same issue but we had a similar problem with a multikey index with poor performance when no results were found.

It is actually a Mongo bug that was fixed in v3.3.8. https://jira.mongodb.org/browse/SERVER-15086

We fixed our problems after upgrading Mongo and rebuilding the index.

Kuikiker
  • 156
  • 13