Nearest Neighbor Algorithm in R-Tree

Question

I'am reading the Paper from Guttman Link to paper/book

And I was wondering how do nearest neighbor queries work with R-Trees or how it is implemented actually. What I have thought of is that you traverse the tree starting at the root and check if one of the entries inlcude the query point.

So the first question is, if a rectangle include the query point, this does not mean all rectangles inside this rectangle will automatically be the nearest to the query point. It is also possible that there is another rectangle which has a fewer distance, even if the query point lies not inside the rectangle?

Second, assume the query point is actually a minimum bouding box, for example mbr = [left,bottom, right, top] and I want all rectangles that overlap this region or better all rectangles where its centroid lies inside the given region. Is this also possible?

TilmannZ · Accepted Answer · 2018-06-20T16:28:30.420

EDIT

Doing numerous experiments, the algorithm by

Hjaltason, Gísli R., and Hanan Samet. "Distance browsing in spatial databases." ACM Transactions on Database Systems (TODS) 24.2 (1999): 265-318.

(as posted in the answer by @Anony-Mousse) is clearly superior to the algorithms I describe here.

OLD ANSWER:

As far as I know, the best kNN search algorithm is the one by

Cheung, King Lum, and Ada Wai-Chee Fu. "Enhanced nearest neighbour search on the R-tree." ACM SIGMOD Record 27.3 (1998): 16-21. (copied from answer by @Anony-Mousse) PDF download

The basic algorithm is also explained in this presenation

If I remember correctly, it does the following things:

Traverse all nodes in the tree, except if they can be excluded based on the current maximal known distance.
Order candidate subnodes before traversing them such that the 'closest' subnodes are traversed first.

As a result, this algorithm very quickly finds the closest neighbours and traverses hardly, if any, nodes that do not contain part of the end-result.

Interestingly, the algorithm by Cheung et al improves previous algorithms by removing some checks that were meant to exclude even more subnodes before traversing them. They could show that the additional checks could not possibly exclude nodes.

score 3 · Answer 2 · answered Aug 22 '17 at 18:37

There are many papers on finding nearest neighbors in R-trees.

Roussopoulos, Nick, Stephen Kelley, and Frédéric Vincent. "Nearest neighbor queries." ACM sigmod record. Vol. 24. No. 2. ACM, 1995.

Papadopoulos, Apostolos, and Yannis Manolopoulos. "Performance of nearest neighbor queries in R-trees." Database Theory—ICDT'97 (1997): 394-408.

Hjaltason, Gísli R., and Hanan Samet. "Distance browsing in spatial databases." ACM Transactions on Database Systems (TODS) 24.2 (1999): 265-318.

Cheung, King Lum, and Ada Wai-Chee Fu. "Enhanced nearest neighbour search on the R-tree." ACM SIGMOD Record 27.3 (1998): 16-21.

Berchtold, S., Böhm, C., Keim, D. A., & Kriegel, H. P. (1997, May). A cost model for nearest neighbor search in high-dimensional data space. In Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems (pp. 78-86). ACM.

Nearest Neighbor Algorithm in R-Tree

2 Answers2