An algorithm to sort a list of values into n groups so that the sum of each group is as close as possible

Question

Basically I have a number of values that I need to split into n different groups so that the sums of each group are as close as possible to the sums of the others? The list of values isn't terribly long so I could potentially just brute force it but I was wondering if anyone knows of a more efficient method of doing this. Thanks.

Isn't this a variation of the subset-sum problem and potentially not computable in polynomial time? — sc_ray, Mar 09 '11 at 16:42
Nice problem, although proposing a solution is difficult because the criteria for a "good" solution can be defined in many different ways (and since a perfect solution might not exist, you need to define a measure of quality). For example, if I come up with n-1 buckets that are all equal but the last bucket is WAY off, is that better than n/2 buckets that are all different, but within a fixed known bound of each other? — davin, Mar 09 '11 at 16:43
I think it's actually (a generalization of) the partition problem, https://secure.wikimedia.org/wikipedia/en/wiki/Partition_problem — Fred Foo, Mar 09 '11 at 16:43
@sc_ray, i dont think it necessarily is, because we can allow an acceptable error in accuracy, which allow for easy optimisations, like bucketing values — davin, Mar 09 '11 at 16:45
This is also somewhat similar to the multi-processor scheduling problem (http://en.wikipedia.org/wiki/Multiprocessor_scheduling). There, the quantity to be minimised is the greatest sum of any group. Here, you don't specify what measure is used to mean "as close as possible", but presumably the mean deviation, variance, or some such. I suspect that an approximate solution to one usually isn't too bad as an approximate solution to the other, especially if you're currently uncertain how to define "as close as possible". — Steve Jessop, Mar 09 '11 at 16:46
@Steve: "as close as possible" is well-defined, it means minimal difference between the sums. — Fred Foo, Mar 09 '11 at 16:54
Hi, if you asign the same value of the element in the set as it weight and you estimate a maximun value for the group (for example with the average value of the sum of all the elements) may be you can use the knapsack algorithm in a recursive way...or not... what do you thing? — SubniC, Mar 09 '11 at 16:55
Is `n` fixed? I.e., do you need `n` groups regardless of whether `n-1` groups would give you a "better" partition? — Jacob, Mar 09 '11 at 16:57

Fred Foo · Answer 1 · 2011-03-09T19:44:36.780

9

If an approximate solution is enough, then sort the numbers descendingly, loop over them and assign each number to the group with the smallest sum.

groups = [list() for i in range(NUM_GROUPS)]
for x in sorted(numbers, reverse=True):
    mingroup = groups[0]
    for g in groups:
        if sum(g) < sum(mingroup):
           mingroup = g
    mingroup.append(x)

edited Mar 09 '11 at 19:44

answered Mar 09 '11 at 16:47

Fred Foo

355,277
75
744
836

6

You'll get a better result if your sorted numbers are sorted from "largest" to "smallest", this spreads the largest numbers across the containers and then try to balance things out using the small numbers. – Vatine Mar 09 '11 at 17:03
1

@Vatine: that's actually what I meant. Fixed the code example, thanks! – Fred Foo Mar 09 '11 at 19:45
NP, with some quick tests, sort order makes a small but noticeable difference partitioning 300 random integers in the interval 1..30000 into 7 partitions, but it was to the order of "hundreds" instead of "tens" for the min and max. – Vatine Mar 09 '11 at 19:50

Kris · Answer 2 · 2011-03-13T20:52:30.080

This problem is called "multiway partition problem" and indeed is computationally hard. Googling for it returned an interesting paper "Multi-Way Number Paritioning where the author mentions the heuristic suggested by larsmans and proposes some more advanced algorithms. If the above heuristic is not enough, you may have a look at the paper or maybe contact the author, he seems to be doing research in that area.

score 1 · Answer 3 · answered Mar 09 '11 at 17:08

Brute force might not work out as well as you think...

Presume you have 100 variables and 20 groups:

You can put 1 variable in 20 different groups, which makes 20 combinations.
You can put 2 variables in 20 different groups each, which makes 20 * 20 = 20^2 = 400 combinations.
You can put 3 variables in 20 different groups each, which makes 20 * 20 * 20 = 20^3 = 8000 combinations.
...
You can put 100 variables in 20 different groups each, which makes 20^100 combinations, which is more than the minimum number of atoms in the known universe (10^80).

OK, you can do that a bit smarter (it doesn't matter where you put the first variable, ...) to get to something like Branch and Bound, but that will still scale horribly.

So either use a fast deterministic algorithm, like larsman proposes. Or if you need a more optimal solution and have the time to implement it, take a look at metaheuristic algorithms and software that implement them (such as Drools Planner).

score 1 · Answer 4 · answered Mar 09 '11 at 17:41

You can sum the numbers and divide by the number of groups. This gives you the target value for the sums. Sort the numbers and then try to get subsets to add up to the required sum. Start with the largest values possible, as they will cause the most variability in the sums. Once you decide on a group that is not the optimal sum (but close), you could recompute the expected sum of the remaining numbers (over n-1 groups) to minimize the RMS deviation from optimal for the remaining groups (if that's a metric you care about). Combining this "expected sum" concept with larsmans answer, you should have enough information to arrive at a fast approximate answer. Nothing optimal about it, but far better than random and with a nicely bounded run time.

score 1 · Answer 5 · edited May 23 '17 at 11:54

Do you know how many groups you need to split it into ahead of time?

Do you have some limit to the maximum size of a group?

A few algorithms for variations of this problem:

Knuth's word-wrap algorithm
algorithms minimizing the number of floppies needed to store a set of files, but keeping any one file immediately readable from the disk it is stored on (rather than piecing it together from fragments stored on 2 disks) -- I hear that "copy to floppy with best fit" was popular.
Calculating a cutting list with the least amount of off cut waste. Calculating a cutting list with the least amount of off cut waste
What is a good algorithm for compacting records in a blocked file? What is a good algorithm for compacting records in a blocked file?
Given N processors, how do I schedule a bunch of subtasks such that the entire job is complete in minimum time? multiprocessor scheduling.

An algorithm to sort a list of values into n groups so that the sum of each group is as close as possible

5 Answers5