-1

Q) Find the closest pair of points in a given set of points.

Expected time complexity: O(nlog2(n)) / O(nlog(n))

i was reading up this problem from here. However i am unable to understand a few things:

  1. what do we achieve by sorting points by Y which are already sorted by X?

  2. when we take a point as mid from one of the halves, dont we miss out on a few cases? i.e. how does taking a mid point ensure that the strip formed includes all the points that are candidates for a distance less than the distance returned by the two recursive calls?

  3. even if the above point is proved, shouldnt we take the mid point of both halves i.e. m of left half and (m+1) of right half?

keemahs
  • 780
  • 11
  • 14

2 Answers2

2

2) You use the x coordinate of the mid point to find a line that splits the points in half. The recursive calls compare all pairs that don't cross that line, so all that remains is to compare pairs that do cross the line. If two points are less than d apart, and they're on either side of the line, then they must both be less than d away from the line.

3) Well, that will be done in the recursive calls. I don't know what else you could mean.

1) The points in the strip are separated out and sorted. Then, given one point in the strip, all the points with y coordinate within d are in a contiguous region around it. These are the only other points you need to compare it with, and the sorting makes it quick and easy to find these points. The magic of this algorithm is that the number of possible points in any such region is strictly limited to a constant value. This follows from the fact (established by the recursive calls) that no pair of points on either side of the line are closer than d.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
1

If I understand this correctly, the following point is actually a bit misleading:

3) Recursively find the smallest distances in both subarrays. [...]

I think it should say: Split the subarrays recursively until they contain a small number of points (maybe 2 or 3? It doesn't say) and then find the smallest distance in each of this small strips. Then we proceed with 4)-7) and apply a y-strip search for every border between all subarrays that we created.

Here is my understanding of what happens, it doesn't mention array splitting because I think this only confuses what is going on:

1) Sort all points by x. Walk over the sorted array and calculate the distance for each consecutive pair of points, ie. each point and the one that follows it. We get a minimum distance 'd'.

2) Sort all point by y. Again, for each point we calculate the distance to its direct following neighbor(s) in the sorted array. Unlike '1)', we do not only consider the direct neighbors, but all neighbors that are less than 'd' ahead of the current point. When we find a pair with distance smaller 'd', we use this new distance as our new 'd' from this moment on.

This should give us the smallest d.

Basically, splitting an array recursively until only pairs remain (which is what I guess they imply) is in effect the same as simply walking over the array and comparing each consecutive point pair. There is one difference in my step 1) though: they only calculate the x-distance of every 2nd point to its successor (there is a subarray border after every 2nd point). This may help (because they are doing only half the work) or be detrimental (because they may no get the best d from this first pass).

I haven't verified this, please let me know if I understood this wrong or if there are problems with my 'translated' approach.

TilmannZ
  • 1,784
  • 11
  • 18
  • i cant fully understand what you say...but 2nd and 1st points of @Matt Timmermans is the crux i.e. when we find solutions for left and right half we miss out on set of points which are such that one is in left half and one in right. There must be a geometrical proof which states that (IN THE MIDDLE STRIP, IF YOU TAKE ALL POINTS OF Y DISTANCE LESS THAN D(THE MIN FROM BOTH HALVES FROM ALL POINTS THEN FOR ALL POINTS A CONSTANT NUMBER OF POINTS ARE CHECKED). – keemahs Jun 04 '20 at 00:16
  • 1
    I assume "recursive" means that we should split the strips repeatedly (=recursively) into smaller strips until each strip contains only very few (=2 ?) points. Say we create *m* strips, that means we have *m-1* borders, so we have *m-1* "middle" strips to process. In fact, since strips are so narrow, a "middle" strip may have to cover more than one strip on each side (shortest distance could be across more than one border). If we would only process one middle strip, we would have missing points. In effect, the many "middle" strips cover the whole area, I don't think we miss any points. – TilmannZ Jun 04 '20 at 15:46