0

I am trying to create a kd-tree through scipy's KD_tree class built by objects rather than pure coordinates. The objects has a (x,y) tuple, and the tree is based upon this, but i would like to include the object itself as the node/in the node.

Is there some "easy" approach to this? Had a look on scipy kdtree with meta data, which says to use a third dimension as a object pointer(?). Wouldn't the tree then apply this value to the comparison of neighbors? I am also in the same boat as this gentleman, where creating my own kd-tree would be nice to skip for now.

PS. This is my first post, so be gentle with me ;)

1 Answers1

1

The API of scipy's KdTree wants a 2D array of coordinates as input and not any sort of object array. In this array the rows are the points and the cols the coordinates of those points.

In the question you link to, he doesn't mean that there is a third dimension but a third index. Suppose you are looking for a single nearest neighbor and you query using some point, the function will return a distance and an index. The index is a reference into the array with which you built the tree. The distance is the difference in distance between your query point and the tree point.

So to use this tree you could keep two arrays. One with the object coordinates and a second one with the objects. They should be in the same order, so that when a query returns an index, they mean the same thing in both arrays.

PS. This is my first answer, so also be gentle :D

digital_jb
  • 71
  • 5
  • 1
    Thank you for your answer! I managed to solve it earlier today, just the way you described. As efficiency is what i am after i might have to build a more fit tree to optimize my case, but works alright as of now. The I/O between the lists is pulling some clock cycles, and could be avoided :/ – Finn Olav Sagen Feb 09 '21 at 19:07
  • NP! Just for your consideration: Queries are fast when points are close to each other / aligned in memory (such as with a numpy array). Interleaving with object metadata will slow down the queries. Splitting into two arrays may take less clock cycles than the loss of query speed with a different tree layout. – digital_jb Feb 09 '21 at 20:36