Location/Proximity search on large record set

Question

Say I have a database table representing users with potentially millions of records (Wishful thinking). This table contains a whole bunch of information about each user including information about their location:

City
County/State etc
Country
Latitude
Longitude
Geohash based on the latitude/longitude values.

I would like to implement a feature where by a logged in user can search for other users that are nearby.

Ideally, I would like to grab say the 20 users that are geographically closest to the user, followed by the next 20, and the next 20 etc. So essentially I want to be able to order my users table by the distance from a certain point.

Approach 1

I have some previous experience with the haversine formula which I used to calculate the distance between one point and a few hundred others. This approach would be ideal on a relatively small record set but I fear it would become incredibly slow with such a large record set.

Approach 2

I've additionally done some research into geohashing and I understand how the hash is calculated and I get the theory behind how it represents a location and how precision is lost with shorter resolutions. I could of course grab the users that are located near the user's geographical area by grabbing users that have a similar beginning to their geohash (Based on a precision I specify - and potentially looking in the neighbouring regions) but that doesn't solve the problem of needing to sort by location. This approach is also not great for edge cases where 2 users may be very close to one another but lie close to the edges of 2 regions represented by the geohash.

Any ideas/suggestion towards the approach would be greatly appreciated. I'm not looking for code in particular but links to good examples and resources would be helpful.

Thanks, Jonathon

Edit

Approach 3

After some thought I've come up with another potential solution to consider. Upon receiving each user's location information, I would store information about the location (town/city, area, country, latitude, longitude, geohash maybe) in a separate table (say locations). I would then connect the user to the location by a foreign key. This would give me a much smaller dataset to work with. To find nearby users I could then simply find other locations that are close to the user's location and then use their IDs to find other users. Perhaps some sort of caching could be then implemented by storing a list of the nearby location IDs for each location.

Check out [MySQL’s Extensions for Spatial Data](http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html) … see also http://www.plumislandmedia.net/mysql/using-mysqls-geospatial-extension-location-finder/ — CBroe, Jan 26 '15 at 16:31
Downvote = "This question does not show research effort; it is unclear or not useful". To downvoter - where exactly does my question fit into this category? — Jonathon, Jan 27 '15 at 12:27

Micromega · Answer 1 · 2015-01-26T22:40:46.610

0

You can try a space filling curve. Translate the co-ordinate to a binary and interleave it. Treat it as base-4 number. You are also wrong a geohash can be used to sort also by location. Most likely use a bounding box and filter the solution and then use the harvesine formula.

edited Jan 26 '15 at 22:40

answered Jan 26 '15 at 16:06

Micromega

12,486
7
35
72

Thanks for your answer. I'll have a look into a space filling curve. What do you mean by "You are also wrong the result of a geohash is also sorted by distance"? – Jonathon Jan 26 '15 at 17:28
@Jonathon:You can also use geohash to sort by location. – Micromega Jan 26 '15 at 17:33
I know you can use it to group by location i.e. determine which geohashes begin with similar X characters. Can you explain how I could use it to order by location (Preferably using MySQL)? – Jonathon Jan 27 '15 at 09:04
@Jonathon:Each X character is t – Micromega Jan 27 '15 at 12:08
@Jonathon:Mysql uses r-trees. But X similar character is because you treat the binary as base-4. You can sort by the binary. – Micromega Jan 27 '15 at 12:10
But can you sort by distance in every direction? Given a geohash, is it possible to find from a table of potentially millions of records, the geographically closest 20 users by their geohashes? It might be relatively easily possible to sort them linearly in either ascending or descending order but, does that account for every direction (N, E, S, W, NE, NW, SE, SW, NNE etc.)? – Jonathon Jan 27 '15 at 12:24
@Jonatha:Not sure whats your question? Geohash is an approximately algorithm. but its very useful. – Micromega Jan 27 '15 at 12:29
I know geohash is for approximation and represents an area of a grid. I am asking how can I sort a large result set by location/proximity. Given User A at a position, how can I find the X geographically closest users? If a geohash is a possible solution - I would like to know how this can be achieved – Jonathon Jan 27 '15 at 12:33
@Jonathon:Use the geohash to sort by location. (Not in MySql). – Micromega Jan 27 '15 at 12:49
@Jonathon:Like in the wikipedia article you can use a geohash with other mapping projections. Perhaps it improves the direction thing. – Micromega Jan 28 '15 at 19:36

Location/Proximity search on large record set

1 Answers1