2

We are using mongo (via pymongo) to store a database of points in our system. This data is returned via our api using bounding box queries ($geoWithin).

We want to limit the number of results returned to 200 sorted from the center. I'm trying to figure out the best way to do this. Currently, we get all items (no limit). Then, we calculate distance and sort in python. However, that those calculations very slow and memory intensive for large datasets.

Anyone have better suggestions? I see in other SO questions that it isn't possible to sort bounding box queries. However, most of those questions were 2+ years old.

keithhackbarth
  • 9,317
  • 5
  • 28
  • 33
  • http://api.mongodb.org/python/current/api/pymongo/cursor.html#pymongo.cursor.Cursor.limit is not ok ? – silviud Mar 12 '15 at 18:01
  • `$geoWithin` doesn't sort results. You can can use `$near` or the `$geoNear` aggregation stage with an `$in` query using the 200 `_id`s from the first `geoWithin` query to get the server to sort your results by distance. That should probably be faster than going the calculations and sorting application side. – wdberkeley Mar 12 '15 at 18:10
  • @silviud - Just using a limit? No, the problem is more complicated than that. – keithhackbarth Mar 12 '15 at 18:26
  • @wdberkeley - Good suggestion. I threw 200 out there as an arbitrary limit. The actual result might be much bigger (think 20,000). Start running in memory problems on that side too. – keithhackbarth Mar 12 '15 at 18:26

1 Answers1

2

Alright, I think I figured out a solution. Turns out that you can use both a point/radius and bounding box in the same query.

    # do both a bounding box and point query
    # point query should conatain the entire bbox
    # this provides sorting and distance calculations
    max_distance = int(haversine(sw_lat, sw_lon, self.centroid.y, self.centroid.x))
    self.new_query = {
        '$and': [
            {'point': {
                '$geoWithin': {"$box": box }
            }},
            {'point': OrderedDict([
                ('$geoNear', {
                    '$geometry':  {
                        'type': 'Point' ,
                        'coordinates': [self.geo.centroid.coords[0], self.geo.centroid.coords[1]]
                    },
                    '$maxDistance': max_distance
                }),

            ])}
        ]
    }
keithhackbarth
  • 9,317
  • 5
  • 28
  • 33