0

So if I have a query that does the following (in pseudo code)

find(a nearby x, b > y).sort(c)

where a is a geo column, b is type of long, and c is also a type of long

Would the compound index on (a:2d, b:1, c:1) work and suggested?

tom
  • 14,273
  • 19
  • 65
  • 124

1 Answers1

2

Geospatial queries have their own index category (as you mention), and the geohashing greatly improves the index performance of the first key lookup--it's better than a range if you can set it up right. In any case, I think your strategy will work: the key will be setting $maxDistance to something fairly small.

I inserted 10 million random geo records to match your description, like so:

{ "_id" : ObjectId("4f28e1cffc90631d239f8b5a"), "a" : [ 46, 47 ], "b" : ISODate("2012-02-01T06:53:25.543Z"), "c" : 19 }
{ "_id" : ObjectId("4f28e1bdfc90631d239c4272"), "a" : [ 54, 48 ], "b" : ISODate("2012-02-01T06:53:32.699Z"), "c" : 20 }
{ "_id" : ObjectId("4f28e206fc90631d23aac59d"), "a" : [ 46, 52 ], "b" : ISODate("2012-02-01T06:55:14.103Z"), "c" : 22 }
{ "_id" : ObjectId("4f28e1a7fc90631d23995700"), "a" : [ 54, 52 ], "b" : ISODate("2012-02-01T06:52:33.312Z"), "c" : 27 }
{ "_id" : ObjectId("4f28e1d7fc90631d23a0e9e7"), "a" : [ 52, 46 ], "b" : ISODate("2012-02-01T06:53:11.315Z"), "c" : 31 }

With the maxDistance at something below 10 the performance is really quite good.

db.test13.find({a:{$near:[50,50], $maxDistance:4}, b:{$gt:d}}).sort({c:1}).explain();
{
"cursor" : "GeoSearchCursor",
"nscanned" : 100,
"nscannedObjects" : 100,
"n" : 100,
"scanAndOrder" : true,
"millis" : 25,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {

}
}

If you leave out maxDistance it starts to suffer quite a bit. Some of the queries took up to 60 seconds to run. The secondary range parameter doesn't seem to help much, even if the range is fairly narrow--it seems to be all about the maxDistance.

I recommend you play around with it to get a feel for how the geospatial index performs. Here is my test insert loop. You can try limiting the bits as well for less resolution

function getRandomTime() {
   return new Date(new Date() - Math.floor(Math.random()*1000000));
}

function getRandomGeo() {
   return [Math.floor(Math.random()*360-180),Math.floor(Math.random()*360-180)];
}

function initialInsert() {
   for(var i = 0; i < 10000000; i++) {
      db.test13.save({
         a:getRandomGeo(),
         b:getRandomTime(),
         c:Math.floor(Math.random()*1000)
      });
   }
}
Eve Freeman
  • 32,467
  • 4
  • 86
  • 101
  • What bothers me that what's mentioned in http://www.mongodb.org/display/DOCS/Indexing+Advice+and+FAQ where it states 3. Only use a range query or sort on one column, any comments on that? – tom Feb 01 '12 at 05:49
  • It's much faster if you pull out one of the range queries. Do you really need to query like that? – Eve Freeman Feb 01 '12 at 05:56
  • Yeah, I need find entities within a given distance (column a:2d, and assume the $near is a range query) and since a given time (column b:1) and then do sort against another column c:1. Is there a better way to achieve the same result? – tom Feb 01 '12 at 06:04
  • If you make it a="something" or b="something", it will be fast. – Eve Freeman Feb 01 '12 at 06:04
  • Hmmm. I would try it out--$near uses some geohashing in it so it might not behave too badly. – Eve Freeman Feb 01 '12 at 06:13
  • If you change your question to mention geospatial (closer to your real query) I'll rework my answer with some examples. – Eve Freeman Feb 01 '12 at 06:15
  • Just made the change suggested to the title and the question. – tom Feb 01 '12 at 06:20
  • Thanks a lot, this really helps! I do have different level of distance to filter the data and cap it to no more than 5 miles, which seems will work. I assume another factor also matters - average # of entities within a given distance (range). – tom Feb 02 '12 at 02:09
  • Indeed. The amount in the range will make a difference as well--you can probably decrease the "bit" value for the geospatial index to try different hash resolutions. The default is like 1 foot, which is probably too fine for most purposes. – Eve Freeman Feb 02 '12 at 15:08