0

Is it possible to find the k pairs of closest points in a set of n points faster than O(n^2)?

I know that I can calculate the closest pair of points in O(nlogn), but with that algorithm, not all of the distances are calculated, so I cannot return the top k closest pairs of points.

This problem is trivial if using the "Brute Force" method of calculating the distance of all the edges of points, but this has a complexity of [n * (n-1)]/2 and I would like to find something more efficient.

Edit: See the closest pairs algorithm here: https://www.geeksforgeeks.org/closest-pair-of-points-using-divide-and-conquer-algorithm/

  • Re: "I can calculate the closest pair of points in O(nlogn), but with that algorithm, not all of the distances are calculated, so I cannot return the top k closest pairs of points": Could you provide details about the algorithm you're referring to? (Or a link to the details?) – ruakh Mar 04 '19 at 19:53
  • `closest points` how many dimensions? – greybeard Mar 05 '19 at 09:54
  • (The obvious extension would seem to be to use a *priority queue* instead of a *min candidate*.) – greybeard Mar 06 '19 at 06:03
  • There are only 2 dimensions (x,y). – Zach Romano Mar 07 '19 at 04:06

1 Answers1

0

One viable option for small k would be to use your O(nlogn) algorithm repeatedly on subsets of the original set of points, removing points as you go. More specifically, keep a set of points that formed a minimal pair. Every time you query for the next closest pair, we'll query for the next closest pair within these points and between each point and the rest of the original points, and take the closer of the two pairs.

To start, remove all but one of these points from the original set, and calculate the closest pair. Repeat this for each of the points in this "min set", and keep the overall closest pair. This will take O(j*nlogn) time to compute, when the "min set" is of size j.

Then, query for the next closest pair within this "min set" via a find-min (O(1) time) on a min-max heap of size k that we'll build as we add points to our "min set". Every time we add points to our "min set", we'll calculate the distance between each point in "min set" (size j) and these (up to) 2 new points, insert these into our min-max heap, and then delete as many elements as necessary to make the heap size k again (at most 2j elements) in O(jlogk) time.

Now, we take the closer pair of these two (deleting from the heap if relevant- O(logk) time), add the points to our "min set" and update the min-max heap as described, and repeat for the remaining k-j closest pairs. Overall, this will take O((k^2)nlogn + (k^2)logk + klogk) = O((k^2)(nlogn + logk)) = O((k^2)nlogn) time.

Dillon Davis
  • 6,679
  • 2
  • 15
  • 37
  • Okay, I'm a little confused still, is this right: 1) find the first 2 points that make up the closest pair 2) remove one of those points from the set, then run the closest point again (nlogn) 3) check the point removed in step two against the remaining n-1 points 3.1) if k>1, then must check all points (kn) 4) repeat k times building my set (k) Is this right? Wouldn't this be k*(nk + nlogn)? Thanks for the quick reply! – Zach Romano Mar 05 '19 at 03:14
  • @ZachRomano think of it this way, you have two kinds of points- those that are unpaired, and those that have been paired at least once. The latter make up "min set". So in step 2, you are "adding back" each point from "min set" (including previously removed points) to the others, in turn (to see if any of them form a closest pair with the others). This takes knlogn, not just nlogn. You could use some fancy data structures here to avoid recomputing some of these, but it would get a bit hairy. – Dillon Davis Mar 05 '19 at 03:40
  • I am a bit lost on steps 3.1 and 4 though. Are you talking about the 3rd paragraph. To rephrase, you will query a min-max heap for the closest pair that can be made from entirely within "min set" (the paired/removed points). This only takes O(1) time, and if its smaller than the other closest pair, we'll remove it which takes O(logk) time. However, we need to build the min-max heap first in order to use it. So when we remove a pair of points / add them to "min set", we'll also see what pairs they form with existing points in "min set". We'll add these, and shrink the heap to size k. – Dillon Davis Mar 05 '19 at 03:45
  • This will leave our heap with just the smallest k pairs which can be formed entirely within "min set"- exactly what we require. This takes klogk time. Putting this all together, and repeating k times gives us our k^2(nlogn + klogk) = k^2nlogn time. – Dillon Davis Mar 05 '19 at 03:48