6

I know this may be a duplicate, but it seems like a variation on the 'Closest pair of Points' algorithm.

Given a Set of N points (x, y) in the unit square and a distance d, find all pair of points such that the distance between them is at most d.

For large N the brute force method is not an option. Besides the 'sweep line' and 'divide and conquer' methods, is there a simpler solution? These pair of points are the edges of an undirected graph, that i need to traverse it and say if it's connected or not (which i already did using DFS, but when N = 1 million it never finishes!).

Any pseudocode, comments or ideas are welcome, Thanks!

EDIT: I found this on Sedgewick book (i'm looking at the code right now):

Program 3.18 uses a two-dimensional array of linked lists to improve the running time of Program 3.7 by a factor of about 1/d2 when N is sufficiently large. It divides the unit square up into a grid of equal-sized smaller squares. Then, for each square, it builds a linked list of all the points that fall into that square. The two-dimensional array provides the capability to access immediately the set of points close to a given point; the linked lists provide the flexibility to store the points where they may fall without our having to know ahead of time how many points fall into each grid square.

Fernando
  • 7,785
  • 6
  • 49
  • 81

2 Answers2

2

We are really looking for points that are inside of a circle of center (x,y) and radius d.

The square that encloses circle is a square of center (x,y) and sides 2d. Any point out of this square does not need to be checked, it's out. So, a point a (xa, ya) is out if abs(xa - x) > d or abs (ya -yb) > d.

Same for the square that is enclosed by that circle is a square of center (x,y) and diagonals 2d. Any point out of this square does not need to be checked, it's in. So, a point a (xa, ya) is in if abs(xa - x) < (d * 1.412) or abs(ya -yb) < (d * 1.412).

Those two easy rules combined reduce a lot the number of points to be checked. If we sort the point by their x, filter the possible points, sort by their y, we come to the ones we really need to calculate the full distance.

Tony Amaro
  • 36
  • 3
0

For any given point, you can use a Manhattan distance (x-delta plus y-delta) heuristic to filter out most of the points that are not within the distance "d" - filter out any points whose Manhattan distance is greater than (sqrt(2) * d), then run the expensive-and-precise distance test on the remaining points.

Zim-Zam O'Pootertoot
  • 17,888
  • 4
  • 41
  • 69
  • If a point's coordinates are [x, y] then you can eliminate all points whose x-coordinates are less than (x-d) or whose x-coordinates are greater than (x+d), and all points whose y-coordinates are less than (y-d) or whose y-coordinates are greater than (y+d). This should greatly reduce the number of pairwise comparisons you'll need to perform. – Zim-Zam O'Pootertoot Apr 05 '13 at 18:17
  • Put the points into buckets. Let's say you're on a 100x100 coordinate grid, you'll create around 200 buckets: a bucket for points whose x-coordinates are 0 <= x < 1, another for points whose x-coordinates are 1 <= x < 2, etc, and another 100 buckets for the y-coordinate. Now let's say your d = 5 and you have a point at [10,20], you'll take all of the points in the x-coordinate buckets from 5 to 15 and intersect them with the points in the y-coordinate buckets from 15 to 25. – Zim-Zam O'Pootertoot Apr 05 '13 at 18:30
  • Be sure to watch for corner cases, you'll probably want to take the x-buckets from 4 to 16 and the y-buckets from 14 to 26 just to be on the safe side. – Zim-Zam O'Pootertoot Apr 05 '13 at 18:31
  • To be clear, each point will be indexed in two buckets: an x-coordinate bucket and a y-coordinate bucket. – Zim-Zam O'Pootertoot Apr 05 '13 at 18:31
  • I think i get the idea. This will be faster than O(N^2) because i will take the right bucket for each one of the N points? thanks for your patience LOL – Fernando Apr 05 '13 at 18:36
  • That's the idea - by indexing the points you don't need to do pairwise comparisons for all of them. – Zim-Zam O'Pootertoot Apr 05 '13 at 18:39
  • Instantiate a struct for each point, and use an array of arrays for your buckets. [Here is a set intersection algorithm I found](http://www.c-program-example.com/2012/01/c-program-to-find-two-sets-intersection.html) – Zim-Zam O'Pootertoot Apr 06 '13 at 22:07