1

I have two disjoint-sets of points in 3D. I need to find the k pair of points with the minimum distances. Each point has (x, y, z) coordinates.

Constaints: The solution has to be a serial optimal solution. No multithreading please. Approaches such as divide and conquer/dynamic programming can be used.

My current approach is:

listOfPairs = []
for all points a in setA
    for all points b in setB
        distance = calcDistance(a, b)
        listOfPairs.append((a, b, distance))

sortByDistance(distance) // using the built in sort method
PrintPointsAndDistances(listOfPairs, k) // print the first k elements

Thanks.

Arjun C
  • 357
  • 3
  • 8
  • What about a k-D tree? – meowgoesthedog Oct 12 '18 at 09:52
  • 2
    @meowgoesthedog: a kD-tree might be a useful ingredient, but you need to elaborate on that. The use for that problem is not straightforward. –  Oct 12 '18 at 11:58
  • My bad. I was thinking of adding the larger set of points to a k-D tree and performing [nearest neighbor search](https://en.wikipedia.org/wiki/K-d_tree#Nearest_neighbour_search) for each point in the smaller set, i.e. `O((n + m) log (n + m))` instead of `O(nm)` (using quicksort-style k-min). – meowgoesthedog Oct 12 '18 at 12:07
  • @meowgoesthedog, thanks. Can you please elaborate more? I looked at the k-D Tree and still cannot figure out how I can use it to find the K pairs of atoms with the minimum distance. – Arjun C Oct 12 '18 at 12:18
  • @ArjunC the linked Wikipedia section describes an algorithm to find the closest point in a k-D tree to some arbitrary query point, in roughly logarithmic time for randomly distributed points. Searching around on Google will more than likely get you many existing implementations (like [this one](https://rosettacode.org/wiki/K-d_tree#Python)). (This assumes that `k` is much smaller than the total number of pairs `~ nm`.) – meowgoesthedog Oct 12 '18 at 12:33

1 Answers1

1

This can be done with a priority queue. As you have done

priorityQueue = PriorityQueue(k) // of size k
for all points a in setA
    for all points b in setB
        distance = calcDistance(a, b)
        priorityQueue.push_with_priority((a, b), distance)

What you are left are the k shortest distance pairs, and the algorithm will run in Θ(N*log(k))

Bob Bills
  • 519
  • 4
  • 8
  • Thanks. Do you know how I can limit the size of the priority queue in c++? – Arjun C Oct 13 '18 at 11:44
  • seems like you would need to make your own. You can wrap a std::set and on each push if size is already N elements delete the last one as set.erase(set.end()), the delete is specified as amortized constant time. – Bob Bills Oct 13 '18 at 12:39