2

The gist is... we have two sets of points A and B. Set A and B have the same number of points n.

Formal Problem:

Construct a minimum cost complete bipartite matching between points in A and B. The cost of a matching (a, b) is the distance(a, b). Does there exist an algorithm faster than O(n^3)?

Notes:

  • Every point a in A and point b in B is in a matching (a, b).
  • Every point a in A or B is in exactly one matching.
  • sum( distance(a, b) for every (a, b) matching ) is minimized.

Example:

  • Point a (0,0)
  • Point b (2,0)
  • Point c (0,1)
  • Point d (-2,2)
  • Set Z {a, d}
  • Set Y {b, c}

Solution:

Matching 1: (a, b) (d, c)

sum(distance(a, b), distance(d, c)) = 2 + sqrt(5) ~ 4.23

Matching 2: (a, c) (d, b)

sum(distance(a, c), distance(d, b)) = 1 + sqrt(20) ~ 5.47

Matching 1 (a, b) (d, c) is the min cost complete bipartite matching!

  • Have you considered casting this as a [maximum flow](https://en.wikipedia.org/wiki/Maximum_flow_problem) problem? – Richard Apr 18 '17 at 18:26
  • Yup. Used an algorithm for min cost max flow to solve this in _O(n^3)_ time. The network flow had _n^2_ edges and _n_ vertices. Looking for a faster algorithm! How can I optimize the min cost max flow knowing prior that every vertex is in the euclidean space and every edge is a euclidean distance? – Michael Petrochuk Apr 18 '17 at 18:32
  • I have no useful input or experience to provide but this seems a rather gnarly problem to solve. Are you certain that you cannot get away with an approximation in your particular application? I imagine that a greedy matching can't be too far for graphs of a size where O(N^3) becomes a burden. I apologize if this is a purely theoretical exercise but you did posted here rather than at https://cs.stackexchange.com – doynax Apr 18 '17 at 18:42
  • @doynax For my application, an approximation works if it's bounded. I'd need to know how good the approximation is. The accepted answer lists some gnarly algorithms for this. – Michael Petrochuk Apr 18 '17 at 20:24
  • Are you planning to implement this? Where do the points come from, or how are they distributed? – tmyklebu Apr 18 '17 at 22:32
  • @tmyklebu Yes. I have implemented this with [Google or-tools](https://developers.google.com/optimization/flow/mincostflow) The two sets being compared are sets of color vectors. Each vector is a color in the LAB space. The euclidean distance in the LAB color space is defined as the color difference between two vectors. A set of colors is defined as a color palette. The euclidean bipartite minimum cost is defined as the difference between the two color palettes. – Michael Petrochuk Apr 18 '17 at 23:23
  • How big are the sets? – tmyklebu Apr 19 '17 at 02:47
  • @tmyklebu So for the current application, its sets of 3 to 10 (simple color palette) but I need to do 100,000 comparisons with different sets of 3 to 10. Looking at another application of the algorithm where its set of 100,000 (every pixel in an image). – Michael Petrochuk Apr 19 '17 at 04:19
  • OK. For the smaller instances, I don't think the geometry will help you. For sets of 100000, perhaps try building a sparse graph where each point is connected to its 10 (or so) nearest neighbours and running a good min-cost matching code. Another heuristic, suggested by that Agarwal-Varadarajan paper, is to snap your input points to a relatively coarse grid and find the matching there. A trick: The union of two matchings is a collection of even cycles; you're allowed to pick the cheaper half of each even cycle. – tmyklebu Apr 20 '17 at 02:37

1 Answers1

5

Yes. If the distances between vertices are taken as the weights of the edges between them, then this is a weighted bipartite graph. The task of finding the maximum/minimum weight matching is known as the assignment problem.

The problem can be solved in O(|V|(|E|+|V| log |V|) time using methods developed in Fredman and Tarjan (1987).

There are further improvements possible in Euclidean spaces. As discussed here. Notably, Vaidya (1988) presents a O(n² log n) algorithm for the L1, L2, and L∞ metrics which improves to O(n² (log n)³) for the L1 and L∞ metrics. Agarwal, Efrat, and Sharir (2006, Section 7) improve on this, giving an algorithm that runs in O(n^(2+ε)) time.

You can do even better if you sacrifice exactness. Agarwal and Varadarajan (2004) present a probabilistic algorithm which, given a value 0<ε<1, finds in O(n^(1+ε)) time a matching with expected cost within a multiplicative O(log(1/ε)) of the optimal.

Do your points happen to lie on the edges of a convex polygon? Then Marcotte and Suri (1991) will be of interest: they present an exact O(n log n) algorithm for that. If the polygon is non-convex, but still simple, then you could use their O(n² log n) algorithm in the same paper.

Richard
  • 56,349
  • 34
  • 180
  • 251