0

Give a Set S, partition the set into k disjoint subsets such that the difference of their sums is minimal.

say, S = {1,2,3,4,5} and k = 2, so { {3,4}, {1,2,5} } since their sums {7,8} have minimal difference. For S = {1,2,3}, k = 2 it will be {{1,2},{3}} since difference in sum is 0.

The problem is similar to The Partition Problem from The Algorithm Design Manual. Except Steven Skiena discusses a method to solve it without rearrangement.

I was going to try Simulated Annealing. So i wondering, if there was a better method?

Thanks in advance.

Chris Gerken
  • 16,221
  • 6
  • 44
  • 59
st0le
  • 33,375
  • 8
  • 89
  • 89
  • This problem is *dope*. I'll definitely think about it = ) – Phonon Mar 28 '11 at 13:01
  • What do you mean by 'without rearrangement'? – dfb Mar 28 '11 at 20:22
  • @spinning_plate, In the skiena version, the order of the elements mattered, you couldn't shuffle them up....so it wasn't a "set" persay. – st0le Mar 29 '11 at 04:42
  • 1
    How do you define the "difference of their sums" when k > 2? – mbeckish Mar 29 '11 at 17:40
  • @mbeckish, I'd say something like max( sum(A)-sum(B) ) for all A,B – dfb Mar 29 '11 at 17:45
  • @spinning_plate - So you're trying to minimize the largest difference between 2 subsets' sums? – mbeckish Mar 29 '11 at 17:50
  • @mbeckish - sure, or we could do something like sum( avg(all X) - y ) for all y. It might matter for certain functions ( like silly optimizations like min( avg(max(subset)) for each subset(y) ) for all y, but for those two I don't think it does. – dfb Mar 29 '11 at 17:59

2 Answers2

3

The pseudo-polytime algorithm for a knapsack can be used for k=2. The best we can do is sum(S)/2. Run the knapsack algorithm

for s in S:
    for i in 0 to sum(S):
        if arr[i] then arr[i+s] = true;

then look at sum(S)/2, followed by sum(S)/2 +/- 1, etc.

For 'k>=3' I believe this is NP-complete, like the 3-partition problem.

The simplest way to do it for k>=3 is just to brute force it, here's one way, not sure if it's the fastest or cleanest.

import copy
arr = [1,2,3,4]

def t(k,accum,index):
    print accum,k
    if index == len(arr):
        if(k==0):
            return copy.deepcopy(accum);
        else:
            return [];

    element = arr[index];
    result = []

    for set_i in range(len(accum)):
        if k>0:
            clone_new = copy.deepcopy(accum);
            clone_new[set_i].append([element]);
            result.extend( t(k-1,clone_new,index+1) );

        for elem_i in range(len(accum[set_i])):
            clone_new = copy.deepcopy(accum);
            clone_new[set_i][elem_i].append(element)
            result.extend( t(k,clone_new,index+1) );

    return result

print t(3,[[]],0);

Simulated annealing might be good, but since the 'neighbors' of a particular solution aren't really clear, a genetic algorithm might be better suited to this. You'd start out by randomly picking a group of subsets and 'mutate' by moving numbers between subsets.

dfb
  • 13,133
  • 2
  • 31
  • 52
0

If the sets are large, I would definitely go for stochastic search. Don't know exactly what spinning_plate means when writing that "the neighborhood is not clearly defined". Of course it is --- you either move one item from one set to another, or swap items from two different sets, and this is a simple neighborhood. I would use both operations in stochastic search (which in practice could be tabu search or simulated annealing.)

Antti Huima
  • 25,136
  • 3
  • 52
  • 71