-1

following problem: I've got an Array with thousands of entries (people with id and geolocation (lat,long)). The aim is to connect each person with another person within a given radius (e.g. 20km). I'm looking for an efficient way to do so. I've already tried Geohashes but since every entry of the array needs to be compared with every other entry, the execution time is horrible when scaling. I would appreciate any great hint! Thanks a lot. I'm using a NodeJS server for the matching algorithm.

2 Answers2

0

I don't understand why you need to compare the geohash of every entry with every other entry?

Start with an empty Map where each key is a geohash and each value is a Set of ids. For each entry, compute the geohash and add the entry id to the related Set. Now for each entry you can pick any other id from the set, if there is no other you'll need to decrease the precision. (Implementation will be bit different if an entry cannot be matched with multiple others.)

Michiel
  • 344
  • 2
  • 9
  • What do you mean by `geohash`? – Anatoly Dec 20 '20 at 15:28
  • @Anatoly So, if you are not comfortable with geo hashes, you can treat the Latitude and longitude as a Value of a vector (Long, Lat), and calculate the distance between the two vectors, and use them to identify if distance is you desired value, this can be an alternative :) – SARAN SURYA Dec 20 '20 at 15:32
  • @Anatoly "Geohash ... encodes a geographic location into a short string of letters and digits." See https://en.wikipedia.org/wiki/Geohash. Any other algorithm that will lose some precision of the location will work as well. The idea is that close locations will map to the same value. – Michiel Dec 20 '20 at 17:47
0

I can think of several improvements here:

  1. If you calculate a distance from Ai point to Aj point then you already have a distance from Aj to Ai. It means you have N*(N-1)/2 comparisons instead of N*(N-1).
  2. Because you have a constant radius You can take radius * radius (i.e radius^2) and not to use a square root to compare a calculated distance with a given radius
  3. If all calculations are synchronous you should use something like worker threads because Node.Js is single-threaded by design.

Maybe you can take a glance at JSTS package because it may save you some time with geospatial calculations.

Another approach is to load all data into DB with a geospatial support, add needed geospatial indexes and use specialized geospatial functions of this DB to calculate a distance and group people by it

Anatoly
  • 20,799
  • 3
  • 28
  • 42