15

Basically I have a number of values that I need to split into n different groups so that the sums of each group are as close as possible to the sums of the others? The list of values isn't terribly long so I could potentially just brute force it but I was wondering if anyone knows of a more efficient method of doing this. Thanks.

Cyborg771
  • 395
  • 1
  • 6
  • 17
  • 1
    Isn't this a variation of the subset-sum problem and potentially not computable in polynomial time? – sc_ray Mar 09 '11 at 16:42
  • Nice problem, although proposing a solution is difficult because the criteria for a "good" solution can be defined in many different ways (and since a perfect solution might not exist, you need to define a measure of quality). For example, if I come up with n-1 buckets that are all equal but the last bucket is WAY off, is that better than n/2 buckets that are all different, but within a fixed known bound of each other? – davin Mar 09 '11 at 16:43
  • 3
    I think it's actually (a generalization of) the partition problem, https://secure.wikimedia.org/wikipedia/en/wiki/Partition_problem – Fred Foo Mar 09 '11 at 16:43
  • @sc_ray, i dont think it necessarily is, because we can allow an acceptable error in accuracy, which allow for easy optimisations, like bucketing values – davin Mar 09 '11 at 16:45
  • This is also somewhat similar to the multi-processor scheduling problem (http://en.wikipedia.org/wiki/Multiprocessor_scheduling). There, the quantity to be minimised is the greatest sum of any group. Here, you don't specify what measure is used to mean "as close as possible", but presumably the mean deviation, variance, or some such. I suspect that an approximate solution to one usually isn't too bad as an approximate solution to the other, especially if you're currently uncertain how to define "as close as possible". – Steve Jessop Mar 09 '11 at 16:46
  • @Steve: "as close as possible" is well-defined, it means minimal difference between the sums. – Fred Foo Mar 09 '11 at 16:54
  • Hi, if you asign the same value of the element in the set as it weight and you estimate a maximun value for the group (for example with the average value of the sum of all the elements) may be you can use the knapsack algorithm in a recursive way...or not... what do you thing? – SubniC Mar 09 '11 at 16:55
  • Is `n` fixed? I.e., do you need `n` groups regardless of whether `n-1` groups would give you a "better" partition? – Jacob Mar 09 '11 at 16:57

5 Answers5

9

If an approximate solution is enough, then sort the numbers descendingly, loop over them and assign each number to the group with the smallest sum.

groups = [list() for i in range(NUM_GROUPS)]
for x in sorted(numbers, reverse=True):
    mingroup = groups[0]
    for g in groups:
        if sum(g) < sum(mingroup):
           mingroup = g
    mingroup.append(x)
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • 6
    You'll get a better result if your sorted numbers are sorted from "largest" to "smallest", this spreads the largest numbers across the containers and then try to balance things out using the small numbers. – Vatine Mar 09 '11 at 17:03
  • 1
    @Vatine: that's actually what I meant. Fixed the code example, thanks! – Fred Foo Mar 09 '11 at 19:45
  • NP, with some quick tests, sort order makes a small but noticeable difference partitioning 300 random integers in the interval 1..30000 into 7 partitions, but it was to the order of "hundreds" instead of "tens" for the min and max. – Vatine Mar 09 '11 at 19:50
5

This problem is called "multiway partition problem" and indeed is computationally hard. Googling for it returned an interesting paper "Multi-Way Number Paritioning where the author mentions the heuristic suggested by larsmans and proposes some more advanced algorithms. If the above heuristic is not enough, you may have a look at the paper or maybe contact the author, he seems to be doing research in that area.

Kris
  • 1,388
  • 6
  • 12
1

Brute force might not work out as well as you think...

Presume you have 100 variables and 20 groups:

  • You can put 1 variable in 20 different groups, which makes 20 combinations.
  • You can put 2 variables in 20 different groups each, which makes 20 * 20 = 20^2 = 400 combinations.
  • You can put 3 variables in 20 different groups each, which makes 20 * 20 * 20 = 20^3 = 8000 combinations.
  • ...
  • You can put 100 variables in 20 different groups each, which makes 20^100 combinations, which is more than the minimum number of atoms in the known universe (10^80).

OK, you can do that a bit smarter (it doesn't matter where you put the first variable, ...) to get to something like Branch and Bound, but that will still scale horribly.

So either use a fast deterministic algorithm, like larsman proposes. Or if you need a more optimal solution and have the time to implement it, take a look at metaheuristic algorithms and software that implement them (such as Drools Planner).

Geoffrey De Smet
  • 26,223
  • 11
  • 73
  • 120
1

You can sum the numbers and divide by the number of groups. This gives you the target value for the sums. Sort the numbers and then try to get subsets to add up to the required sum. Start with the largest values possible, as they will cause the most variability in the sums. Once you decide on a group that is not the optimal sum (but close), you could recompute the expected sum of the remaining numbers (over n-1 groups) to minimize the RMS deviation from optimal for the remaining groups (if that's a metric you care about). Combining this "expected sum" concept with larsmans answer, you should have enough information to arrive at a fast approximate answer. Nothing optimal about it, but far better than random and with a nicely bounded run time.

phkahler
  • 5,687
  • 1
  • 23
  • 31
1

Do you know how many groups you need to split it into ahead of time?

Do you have some limit to the maximum size of a group?

A few algorithms for variations of this problem:

Community
  • 1
  • 1
David Cary
  • 5,250
  • 6
  • 53
  • 66