5

I have K sets of data points, I would like to make groups of size K which minimize the total sum of intra group distances. I'm familiar with matching algorithms with bipartite graphs, but I would like this for more than two sets.

Any ideas?

Edit :

Each group would be made of one element of each set, no repetitions allowed.

An example : you have {a1, a2, a3}, {b1, b2, b3}, {c1, c2, c3} You want to create groups e.g. {a1, b3, c3}, {a2, b1, c2}, {a3, b2, c1} minimizing the sum of intra group distances.

user3091275
  • 1,013
  • 2
  • 11
  • 27
  • 1
    Sounds like k-means clustering? If that's the case it's NP-hard. Nice question! – molamk Feb 21 '19 at 23:32
  • 1
    Do the groups need to have one element from each of the K sets? Do all the sets have the same number of points? How is distance defined here? Is this physical distance in 2 or 3 dimensions or something else? – Dave Feb 22 '19 at 01:59
  • 1
    Normalise your vectors, group them at some resolution in the n-sphere, then expand, treating the centres of your previous groups as individual vectors. Not optimal, but a useful approximation. – teppic Feb 22 '19 at 02:36
  • Yeah, I was thinking of some heuristics where we would iteratively apply bipartite matching with the current centroids of groups being built and a selected set. – user3091275 Feb 22 '19 at 07:06

2 Answers2

1

This problem can be reduced to another, similar problem that I have solved for another stackoverflow question before. The idea is to compute all combinations of n / k sized groups, and weight these according to their intra group distances. Traverse the search space for valid combinations of combinations. Keep record of the minimal sum, and use this to prune dead-end branches. You can speedup the search using dynamic programming by producing optimal subsets of the solution, and building up to the final solution from that (as described in my other post), or you could use a greedy method and some hand wavey tricks to find a nearly optimal (or optimal) solution (also described in said post). Here is a link to the sub problem that you can reduce this to.

Dillon Davis
  • 6,679
  • 2
  • 15
  • 37
0

Even for k=3 it has the flavor of the NP-hard problem 3-dimensional matching. (The obvious reduction doesn't work because there may be phantom triples created where each of the three pairs of an invalid triple appears separately in a valid triple.)

Depending on the size of the instance, I would either try local search or integer programming with column generation (but the inner problem seems hard without the structure of a low dimensional metric space, and nontrivial even then).

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120