I have a collection with coordinate data in GeoJSON Point form, from which I need to query for the 10 latest entries within an area. There are now 1.000.000 entries but there will be about 10 times more.
My problem is that when there are lots of entries within the desired area, the performance of my queries drops tremendously (case 3). The test data I currently have is random, but the real data won't be, so picking another index (like in the case 4) based purely on the dimensions of the area won't be possible.
What should I do to get it perform predictably regardless of the area?
1. The collection statistics:
> db.randomcoordinates.stats()
{
"ns" : "test.randomcoordinates",
"count" : 1000000,
"size" : 224000000,
"avgObjSize" : 224,
"storageSize" : 315006976,
"numExtents" : 15,
"nindexes" : 3,
"lastExtentSize" : 84426752,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 120416128,
"indexSizes" : {
"_id_" : 32458720,
"position_2dsphere_timestamp_-1" : 55629504,
"timestamp_-1" : 32327904
},
"ok" : 1
}
2. The indexes:
> db.randomcoordinates.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "test.randomcoordinates",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"position" : "2dsphere",
"timestamp" : -1
},
"ns" : "test.randomcoordinates",
"name" : "position_2dsphere_timestamp_-1"
},
{
"v" : 1,
"key" : {
"timestamp" : -1
},
"ns" : "test.randomcoordinates",
"name" : "timestamp_-1"
}
]
3. Find using 2dsphere compound index:
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("position_2dsphere_timestamp_-1").explain()
{
"cursor" : "S2Cursor",
"isMultiKey" : true,
"n" : 10,
"nscannedObjects" : 116775,
"nscanned" : 283424,
"nscannedObjectsAllPlans" : 116775,
"nscannedAllPlans" : 283424,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 3876,
"indexBounds" : {
},
"nscanned" : 283424,
"matchTested" : NumberLong(166649),
"geoTested" : NumberLong(166649),
"cellsInCover" : NumberLong(14),
"server" : "chan:27017"
}
4. Find using timestamp index:
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("timestamp_-1").explain()
{
"cursor" : "BtreeCursor timestamp_-1",
"isMultiKey" : false,
"n" : 10,
"nscannedObjects" : 63,
"nscanned" : 63,
"nscannedObjectsAllPlans" : 63,
"nscannedAllPlans" : 63,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"timestamp" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "chan:27017"
}
Some have suggested to use {timestamp: -1, position: "2dsphere"}
index, so I tried that out as well, but it doesn't seem to perform well enough.
5. Find using Timestamp + 2dsphere compound index
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("timestamp_-1_position_2dsphere").explain()
{
"cursor" : "S2Cursor",
"isMultiKey" : true,
"n" : 10,
"nscannedObjects" : 116953,
"nscanned" : 286513,
"nscannedObjectsAllPlans" : 116953,
"nscannedAllPlans" : 286513,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 4597,
"indexBounds" : {
},
"nscanned" : 286513,
"matchTested" : NumberLong(169560),
"geoTested" : NumberLong(169560),
"cellsInCover" : NumberLong(14),
"server" : "chan:27017"
}