1

The link in wikipedia about kd-trees store points in the inner nodes. I have to perform NN queries and I think (newbie here), I am understanding the concept.

However, I was said to study Kd-trees from Computational Geometry Algorithms and Applications (De Berg, Cheong, Van Kreveld and Overmars), section 5.2, page 99. The main difference I can see is that Overmars places the splitting data in the inner nodes and the actual points of the dataset in the leaves. For example, in 2D, an inner node will hold the splitting line.

Wikipedia on the other hand, seems to store points in inner nodes and leaves (while Overmars only on leaves).

In this case, how do we perform nearest neighbour search? Moreover, why there is this difference?

gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • This site is mostly for questions about specific programming problems. You might get a better response on [Computer Science](http://cs.stackexchange.com). – 500 - Internal Server Error Apr 10 '14 at 13:08
  • I wasn't aware of the other site. I will post the question there too, thanks. – gsamaras Apr 10 '14 at 13:12
  • I would say that the wiki article makes explicit that the cutting planes always go through points of your dataset. That is an additional constraint that is not necessary as can be seen from De Berg et al (Out of curiosity: is it explicitly noted that that chapter is written by Overmars? I always considered Mark de Berg the main author.) The position of your cut does not influence algorithms operating on kd-trees. – Vincent van der Weele Apr 10 '14 at 13:17
  • Yes, that seems to be a constraint not performed in the book. I do not know who wrote the specific section (Overmars is one word :) ). I posted the question here too http://cs.stackexchange.com/questions/23636/kd-tree-stores-points-in-inner-nodes-if-yes-how-to-search-for-nn – gsamaras Apr 10 '14 at 13:18
  • [You shouldn't cross-post to multiple SE sites](http://meta.stackexchange.com/questions/64068/is-cross-posting-a-question-on-multiple-stack-exchange-sites-permitted-if-the-qu). – Bernhard Barker Apr 10 '14 at 14:14
  • Obviously, but I do not know how to delete this post and I as I have mentioned before, didn't know the other site. :) – gsamaras Apr 10 '14 at 17:25

1 Answers1

2

Default k-d-trees should split the data set at a point. This point is then stored on the inner node, and checked as neighbor when you walk down this tree at search time.

Of course you can have various variants of k-d-trees where the split may be at a different place, and when there is no element exactly at the splitting position, you can't have one in the inner node anymore.

Also, as k-d-trees are not dynamic, when simulating deletions via tombstones, the inner node may only contain a tombstone (representing a deleted object).

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Wikipedia's case is the default case. The book I mentioned before, splits on the median point (where we check the i-th variable, i the depth of the tree. When i gets bigger than the coordinates, we init to 0 and do this again) and keeps the split in the inner node. For example, in 2D, the splitting is actually a line, so you store the coordinate of it. The difference is that we have data points only at the leaves. And my question was (and still is), how to backtrack in this case, when searching for NN? – gsamaras Apr 12 '14 at 00:41
  • Where is the problem in backtracking? You have the same spatial layout as if you had elements in the inner nodes too, don't you? I don't see how the math would change. – Has QUIT--Anony-Mousse Apr 12 '14 at 18:45