5

Input:

N points {P1, …. , Pn} - each point is from the same dimension t:

  • Pi = {x_1, …., x_t} where k is between 18-30 .

Distance function – dist(Pi, Pj) - returns a number that is the distance between the points. (The function is a custom function – not a standard Minkowski distance).

The Problem:

Main problem:

  • Find the K closest pairs from all the N points – as fast as possible.

Secondary problem:

  • Given a point Q = {x_1, …, x_t} return the K closest pairs.

Nice to Have:

  • A Database where we can add/remove a point Pi and the above queries will run as fast as possible.

Relevant Data structures:

KD-TREE

R-TREE

BALL-TREE

Possible Solutions:

Main problem:

  • Build a BallTree (sklearn.neighbors.BallTree).

  • For every point P in the BallTree find K closest pairs (Now we have N List- where every list contains the K closest pairs for every Point Pi).

  • Take the best K pairs from all the list above.

Secondary problem:

  • Build a BallTree (sklearn.neighbors.BallTree).

  • Query for closest k pairs for a given point Q.

Time Complexity So far:

  • For every point in the tree (N in total) find K closest pairs which take O(K*log(N)) - So in total O(N * K * log(N)).

  • Take the best K pairs from N sorted list - can take O( Max{ K * log(K), N } ). for example with maintaining a min HEAP of size K.

The total complexity, for now, is O(N * K * log(N)) - can we do better?

Mike M
  • 71
  • 5
  • Have you tried something already? Where are you stuck? – Dominik Jul 31 '18 at 14:13
  • @Dominik As I mentioned I'm trying to solve the Main problem as fast as possible. I added a time complexity section, does it helps? – Mike M Aug 01 '18 at 06:48
  • It sounds more like a Maths problem than a Programming Problem, you may want to ask that on the Maths Stackoverflow Site. – Dominik Aug 01 '18 at 09:45
  • @Dominik I'll ask there as well, but my intention is to find a faster solution for the above problem that really works - that is why I attached the current data structure I'm using (sklearn.neighbors.BallTree). So, the section of the time analysis is just for intuition - I'm just looking for a solution which will run faster. – Mike M Aug 01 '18 at 11:18

0 Answers0