1

I'm taking a data structures course, and recently learned about the k-d tree, which partitions the space by dividing it with a hyperplane.

I'm wondering if there exists a data structure that partitions the space into "inner" and "outer" regions with hyperspheres that connect the nodes to their nearest node. This allows for a more intuitive way to find nodes closest to a certain point contained within a spherical region.

Each node would need to track its position in the space, a radius/distance to the nearest node upon insertion, and its inner- and outer- child nodes.

This naturally lends itself to a binary tree-like structure in which each node has an inner and outer child, and red-black trees (or other mechanisms) can keep this balanced. It has an advantage over k-d trees in that each link does not have to divide through an orthogonal hyperplane and can instead just use a hypersphere in an identical manner to the other nodes.

Similar data structures I've found to this idea include M-trees, but those have some major differences from what I had imagined, because by using a red-black tree as the underlying structure, there is no need to set a limit on the number of circles in a region. There are also ball trees and vantage-point trees, but those vary as well.

Here is a (very crude) Google Slides presentation detailing some of what I had in mind.

What is this data structure called?

Noah A
  • 11
  • 2
  • 1
    I admit I don't understand the slides (why is B a parent of A, it should be the other way around? Why is the parent of D B, and not A? A is closer. In any case, it looks a bit like a [Covertree](https://www.cs.ucr.edu/~cshelton/papers/docs/covertree.pdf) to me. Covertrees work well with high dimensionality, they index purely on distance of points from each other. – TilmannZ Mar 30 '21 at 15:37
  • @TilmannZ I don't know where I implied that B is a parent of A (but please let me know!); every slide shows that B is a child of A. In this implementation I wanted to exploit the natural BST structure that arises with inner- and outer- children (such that every node has at most two children), which is why D's parent is B (B is the last node considered when calling nearest(D), even though A is closer). Admittedly, there are probably many different ways to construct this type of structure. Either way, this does appear to be some close cousin of a covertree, so thank you for the answer! – Noah A Mar 30 '21 at 21:51
  • Sorry, i cannot find the B-is-parent-of-A anymore, I must have misread that. I think I understand a bit better now. If I may suggest to illustrate the two key points with some diagrams: 1) the left/right side of the BST means something quite different to normal BST (it means inside/outside) 2) Unlike most trees, the radius of a node is not what it contains, but the split between left and right child. Also: is it required that the radius of a node is equal to the distance to the parent (if I got this right)? – TilmannZ Apr 01 '21 at 10:34
  • I also see now that this is different to a CoverTree but if you intend to publish this (or performance test it), I suggest having a look at CoverTrees, they are the only tree type that I am aware of (except yours) that index on point distance rather than position. Covertrees are very fast for nearest neighbor search in high dim space (1000+) but are very slow to construct. Self advertisement: If you are interested, I made a Java version for my [spatial index library](https://github.com/tzaeschke/tinspin-indexes). However, it's implementation deviates a bit from the paper I referenced above. – TilmannZ Apr 01 '21 at 10:43
  • There are also indexes for annealing or approximate nearest neighbor searches, but I do not know much about them. Have a look at [this project](https://github.com/erikbern/ann-benchmarks). – TilmannZ Apr 01 '21 at 10:46
  • Finally, there is the [pyramid technique](http://cedric.cnam.fr/vertigo/Cours/GRBD/p142-berchtold.pdf). It uses a pyramid shaped splitting to deal with high dimensions, you may find that interesting. – TilmannZ Apr 01 '21 at 10:48
  • Thanks for the information and suggestions! Yes, for this structure it's required that the radius of a node is equal to the distance to its parents. I think you properly understand what I had in mind now. One thing that may be unclear: this structure would keep track of both distances and positions (in order to calculate the distances for future nodes). – Noah A Apr 02 '21 at 00:01
  • My experience with data structures is quite minimal as I am about halfway through an intro to data structures course, so I apologize if some things were unclear or unconventional. I honestly have no idea how to publish (I just started university) or properly performance test. I'll certainly look through the links, but there's no guarantee I will understand them haha. – Noah A Apr 02 '21 at 00:06
  • @TilmannZ One other thing: while the radius of a node is indeed the split between inner and outer children, this is essentially what is contained within the node (nodes inside are contained, nodes outside are not), but this may be a different definition of contains from what you meant. – Noah A Apr 02 '21 at 00:15
  • @TilmannZ Do you have an email where I can contact you more easily? I have more questions regarding publishing – Noah A Apr 02 '21 at 00:24
  • I have an email address on my [GitHub account](https://github.com/tzaeschke). Otherwise: zoodb at gmx dot de. – TilmannZ Apr 02 '21 at 11:23

0 Answers0