4

I'm looking for an efficient algorithm that for a space with known height, width and length, given a fixed radius R, and a list of points N, with 3-dimensional coordinates in that space, will find all the points within a fixed radius R of an arbitrary point on the grid. This query will be done many times with different points, so an expensive pre-processing/sorting step, in exchange for quick queries may be worth it. This is a bit of a bottleneck step of an application I'm working on, so any time I can cut off of it is useful

Things I have tried so far:

-The naive algorithm, iterate over all points and calculate distance

-Divide the space into a grid with cubes of length R, and put the points into these. That way, for each point, I only have to ever query the immediate neighboring buckets. This has a significant speedup

-I've tried using the manhattan distance as a heuristic. That is, within the buckets, before calculating a distance to any point, use the manhattan distance to filter out those that can't possibly be within radius R (that is, those with a manhattan distance of <= sqrt(3)*R). I thought this would offer a speedup, as it only needs addition instead of multiplication, but it actually slowed the program down by a little bit

EDIT: To compare the distances, I use the squared distance to eliminate having to use a sqrt function.

Obviously, there will be some limit on how much I can speed this up, but I could use any suggestions on things to try now.

Not that it probably matters on the algorithmic level, but I'm working in C.

gms7777
  • 442
  • 4
  • 12
  • `sqrt()` is an expensive function. to speed things up you should just square the other side and compare. – corn3lius Jun 15 '12 at 15:36
  • I already do that. Will edit the question to reflect that, thanks. – gms7777 Jun 15 '12 at 15:42
  • I'm finding the problem statement difficult to parse. Are you trying to find the subset of points (in the list you called N) that are within R of an arbitrary point? What are "elements"--the points from N? – Adrian McCarthy Jun 15 '12 at 17:31
  • That seems right. I have edited to make the terminology within the question more self-consistent. – gms7777 Jun 15 '12 at 19:43
  • I'm still not clear. What changes from query to query? The list of points? The center of the sphere? The radius? All of the above? You talk about preprocessing in order to optimize for multiple queries, but it's not clear what's invariant. – Adrian McCarthy Jun 15 '12 at 19:46
  • Ah, now I understand. The center of the sphere is what changes. The list of points stays constant – gms7777 Jun 15 '12 at 19:51

6 Answers6

3

Don't compare on the radius, compare on the square of the radius. The reason being is, if the distance between two points is less than R, then the square of the distance is less than R^2.

This way, when you're using the distance formula, you don't need to compute the square root, which is a very expensive operation.

tskuzzy
  • 35,812
  • 14
  • 73
  • 140
3

You may get a speed benefit from storing your points in a k-d tree with three dimensions. That will give you searchs in O(log n) amortized time.

Mike Dinescu
  • 54,171
  • 16
  • 118
  • 151
1

I would recommend using either K-D tree or z-curve: http://en.wikipedia.org/wiki/Z-order_%28curve%29

sega_sai
  • 8,328
  • 1
  • 29
  • 38
1

How about Binary Indexed Tree ? (Topcoder tutorials referred) It can be extended to n Dimensions,and is simpler to code.

Arvind
  • 466
  • 3
  • 9
1

Nicolas Brodu's NEIGHAND library do exactly what you want, improving on the bin-lattice algorithm.

More details can be found in his article: Query Sphere Indexing for Neighborhood Requests

Gigi
  • 4,953
  • 24
  • 25
0

[I might be misunderstanding the question. I'm finding the problem statement difficult to parse.]

In the old days, it was often good to design a this type of algorithm with "early outs" that do tests to try to avoid a more expensive calculation. In modern processors, a failure of a branch-prediction is often very expensive, and those early-out tests can actually be more expensive that the full calculation. (The only way to know for sure is to measure.)

In this case, the calculation is pretty simple, so it may be best to avoid building a data structure or doing any clever early-out checks and instead try to optimize, vectorize, and parallelize to get the throughput you need.

For a point P(x, y, z) and a sphere S(x_s, y_s, z_s, radius), the membership test is:

(x - x_s)^2 + (x - y_s)^2 + (z - z_s)^2 < radius^2

where radius^2 can be pre-calculated once for all the points in the query (avoiding any square root calculations). These calculations are all independent, you can compute it for several points in parallel. With something like SSE, you could probably do four at a time. And if you have many points to test, you could split the list and further parallelize the work across multiple cores.

Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175
  • 1
    That certainly works. That is what I referred to as the naive algorithm. I'm sure that if I was only doing 1 query, it would be the fastest. The problem I have is I'm doing 10k-100k queries over 100k-1m points. This turns into a very expensive step, even when parallelized as much as possible. So if there is a data structure that could speed up individual queries, it may be faster, even if the initial construction is expensive. – gms7777 Jun 15 '12 at 19:50
  • I see. I was confused about the problem. I thought the list of points that was varying from query to query. Thanks for clarifying. – Adrian McCarthy Jun 15 '12 at 22:05