Most suitable Closest Neighbor Search for 2D points

Question

This problem arises from creating closest pair of nodes from two sets/surfaces in a FEM mesh. Here the basic logic of what I am trying to do:

I have two surfaces/sets of nodes which are coplanar (can be treated as 2D even if it's a 3D space). From each node from the first set, I'd like to find the closest node in the second set.

Example:

Set 1: (0,0), (1,0), (2,0), (3,0) (4 points)

Set 2: (1.1,0), (0.1,0), (3.1,0), (2.1,0), (2.2,0) (5 points to query)

result vector -> (2,1,4,3,3) (index starts at 1)

I have been looking at many available open-source algorithms (similar to this benchmark: https://github.com/erikbern/ann-benchmarks), both exact and approximate ones. However, before trying them one by one, I was wondering if any of you had any experience/advice for the algorithm I am looking for. I need only the closest node and the number of nodes can vary anywhere to 100k nodes in a 2D space. Speed and efficiency are the major factor.

Thank you in advance!

I have been looking at many algorithms and decided to try a few; however, I'd like to hear the opinion of others as they might recommend some algorithms compared to others. As of right now, I was going to use this ANN (http://www.cs.umd.edu/~mount/ANN/) which seems much simpler and lighter than other massive open source projects.

There are several `algorithms` to solve the problem(`Brute Force`,`Ball Tree`), each has its pros and cons. And choosing the right algorithm also depends on several criterias. I would recommend to try each of them according to your needs and decide which one suits your demand. — Farkhod Abdukodirov, Apr 26 '23 at 00:25

score 0 · Answer 1 · answered Apr 26 '23 at 13:19

As I understand, the ann-benchmarks are for high dimensionality (i.e. 32 dimensions or more). However you are looking for a 2D algorithm.

If you have access to a GPU, the fastest way is probably a GPU based approach, but that comes with some complexity in terms of requiring a larger project to depend on (which you want to avoid I think).

Therefore I would go with one of the traditional spatial indexes: R-tree, quadtree or kd-tree. LSH or a grid index may also be an option.

Now there are several points to consider:

FEM model: I imagine your points move a lot. This means you either need to rebuild the index after every step (I found quadtrees to be fastest at that, kd-trees are also not bad). Or you update the tree by "moving" points around. Again, quadtrees seem to be the best here.
Implementation quality: I found this to be often at least as important as the actual algorithm. First, a sub-par implementation can let a good algorithm perform badly. Second, I found many projects out there that are actually not fully correct, they may not always find the correct nearest neighbor, especially after moving/removing points.

As for LSH, I have very little experience with that you may have to look it up yourself.

As for gridindexes, I am not aware of any open-source implementation, but they a easy to implement yourself (basically you partition your data into a grid (e.g. 10m x 10m). Every grid cell simply contains a list or set of you points. For nearest neighbor search you then have to brute-force-search all points in a given cell and its neighbors (if the closest neighbor is closer than the closest point inside a cell).

I compared several Java implementations here: PDF, 1-NN-queries on 2D point data are in figure 28a, update rates are in figure 40a), index-build performance is in figure 2a). As reference I used an openstreetmap dataset representing the Alps in Europe.

Most suitable Closest Neighbor Search for 2D points

1 Answers1