3

I'm looking for effective algorithm to find a vertex nearest to point P(x, y, z). The set of vertices is fixed, each request comes with new point P. I tried kd-tree and others known methods and I've got same problem everywhere: if P is closer then all is fine, search is performed for few tree nodes only. However if P is far enough, then more and more nodes should be scanned and finally speed becomes unacceptable slow. In my task I've no ability to specify a small search radius. What are solutions for such case?

Thanks Igor

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
Igors
  • 71
  • 3
  • 1
    How many points are in the k-d tree? Also, k-d trees are pretty standard solutions for this problem; I'm surprised that they're not fast enough for you. Are you sure the problem isn't elsewhere or isn't in the k-d tree implementation? – templatetypedef Jan 14 '13 at 17:42
  • Points number is variable (depending on user's model) and usually small like 2-5K. But I've a huge count of queries (millions). – Igors Jan 14 '13 at 18:15
  • Example: vertices are on sphere with R = 100, the query point P is at sphere's center. The tree divides points, distance to divider is 100. Need to check both halfs. Next division - distance is 100 again, also need to scan both. Finally all (or almoost all) vertices are checked. If P is closer to bound - things are going better, but still a lot of points are checked. Of course if P is nearby surface - tree is fast, but I must calculate distance for far points too – Igors Jan 14 '13 at 18:23
  • 1
    Ah, spherical data is the worst-case input to a k-d tree. :-) – templatetypedef Jan 14 '13 at 18:29
  • Are you guaranteed that the data is spherical? Also, are all test points near the center, or just some of them? – templatetypedef Jan 14 '13 at 18:31
  • >> Are you guaranteed that the data is spherical? Nope, it's just a worst case example. But I've a lot of "far" points where this problem appears (more or less) – Igors Jan 14 '13 at 18:46

2 Answers2

1

One possible way to speed up your search would be to discretize space into a large number of rectangular prisms spaced apart at regular intervals. For example, you could split space up into lots of 1 × 1 × 1 unit cubes. You then distribute the points in space into these volumes. This gives you a sort of "hash function" for points that distributes points into the volume that contains them.

Once you have done this, do a quick precomputation step and find, for each of these volumes, the closest nonempty volumes. You could do this by checking all volumes one step away from the volume, then two steps away, etc.

Now, to do a nearest neighbor search, you can do the following. Start off by hashing your point in space to the volume that contains it. If that volume contains any points, iterate over all of them to find which one is closest. Then, for each of the volumes that you found in the first step of this process, iterate over those points to see if any of them are closer. The resulting closest point is the nearest neighbor to your test point.

If your volumes end up containing too many points, you can refine this approach by subdividing those volumes into even smaller volumes and repeating this same process. You could alternatively create a bunch of smaller k-d trees, one for each volume, to do the nearest-neighbor search. In this way, each k-d tree holds a much smaller number of points than your original k-d tree, and the points within each volume are all reasonable candidates for a nearest neighbor. Therefore, the search should be much, much faster.

This setup is similar in spirit to an octree, except that you divide space into a bunch of smaller regions rather than just eight.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • Ok, let's subdivide into small cubes. Looking (fast) for a cube where P is located. Ops - no such cube. Looking for 26 adjacent cubes - no one found. Now what ? – Igors Jan 15 '13 at 06:55
  • @Igors- The idea is that you would partition all of space into cubelets, not just the parts of space that contain points in your test space. That way, you always do hit a cube. As for hitting 26 adjacent cubes - assuming that your cubes aren't too big and each contains a small k-d tree, my guess is that the performance would not be too bad. – templatetypedef Jan 15 '13 at 06:56
  • Filling whole volume with cubes can be dramatically expensive. And even if so, what if an empty cube is found? Refer to its parent until it contains some points? It gets nothing new (a lot of nodes still should be scanned). How about a regular buffer (maybe A-buffer) - but I've no idea how to build it here. Thoughts? – Igors Jan 15 '13 at 16:17
  • @Igors- My initial idea was that you precompute the nearest nonempty cube for each cube, so that if you find an empty cube you can easily and immediately jump to all its nearest neighbors. As for memory usage, this should be tunable by just adjusting the sizes. My sense is that you're not liking this idea, though. I haven't heard of A-buffers or regular buffers before, so you might be best asking a separate question about them. – templatetypedef Jan 15 '13 at 16:33
  • You're right, I should not mix themes. Also I realize that not every question has a simple answer. Thx for your help – Igors Jan 17 '13 at 09:13
0

Well, this is not an issue of the index structures used, but of your query:

the nearest neighbor becomes just much more fuzzy the further you are away from your data set.

So I doubt that any other index will help you much.

However, you may be able to plug in a threshold in your search. I.e. "find nearest neighbor, but only when within a maximum distance x".

For static, in-memory, 3-d point double vector data, with euclidean distance, the k-d-tree is hard to beat, actually. It just splits the data very very fast. An octree may sometimes be faster, but mostly for window queries I guess.

Now if you really have very few objects but millions of queries, you could try to do some hybrid approach. Roughly something like this: compute all points on the convex hull of your data set. Compute the center and radius. Whenever a query point is x times further away (you need to do the 3d math yourself to figure out the correct x), it's nearest neighbor must be one of the convex hull points. Then again use a k-d-tree, but one containing the hull points only.

Or even simpler. Find the min/max point in each dimension. Maybe add some additional extremes (in x+y, x-y, x+z, x-y, y+z, y-z etc.). So you get a small set of candidates. So lets for now assume that is 8 points. Precompute the center and the distances of these 6 points. Let m be the maximum distance from the center to these 8 points. For a query compute the distance to the center. If this is larger than m, compute the closest of these 6 candidates first. Then query the k-d-tree, but bound the search to this distance. This costs you 1 (for close) and 7 (for far neighbors) distance computations, and may significantly speed up your search by giving a good candidate early. For further speedups, organize these 6-26 candidates in a k-d-tree, too, to find the best bound quickly.

Erich Schubert
  • 8,575
  • 2
  • 26
  • 42
  • Oh, and double check and benchmark your code extensively. For example, you can implement a k-d-tree without having a `node` type. And you can save all but 1 square root computations, too! – Erich Schubert Jan 20 '13 at 11:34