12

I need an algorithm to split a list of values into such chunks, that sum of values in every chunk is (approximately) equals (its some variation of Knapsack problem, I suppose)

So, for example [1, 2, 1, 4, 10, 3, 8] => [[8, 2], [10], [1, 3, 1, 4]]

Chunks of equal lengths are preferred, but it's not a constraint.

Python is preferred language, but others are welcome as well

Edit: number of chunks is defined

ts.
  • 10,510
  • 7
  • 47
  • 73
  • I am afraid your problem is not well defined. Is there a requirement for the number of chunks versus the deviation from totally equal sums? As currently posed this problem has a trivial solution of having exactly one chunk. – Petar Ivanov Jul 28 '11 at 07:29
  • It smells NP-Hard. you should define what is "approximately", since I believe there is no polynomial solution to find the best partition. – amit Jul 28 '11 at 07:31
  • @Petar Ivanov: i've precised in edit - number of chunks is defined – ts. Jul 28 '11 at 07:32
  • @amit: that's why I am searching for approximation – ts. Jul 28 '11 at 07:32
  • 1
    This is the generalized partition problem: http://en.wikipedia.org/wiki/Partition_problem, which is NP-complete. – carl Jul 28 '11 at 07:35
  • @ts: a simple greedy algorithm can assure the difference between each subset is not greater then max{S}, will that do? – amit Jul 28 '11 at 07:37
  • @amit: hard question, values in my real list can have some maxima far beyond the average (ie thousand values <= 10 and one value of 1000) – ts. Jul 28 '11 at 08:04
  • @ts: @Alin`s answer offers this approximation. if it is good enough - take it, if not: try to look at my suggestion of using an Artificial Intelligence tools for this problem. – amit Jul 28 '11 at 08:07

5 Answers5

16

Greedy:
1. Order the available items descending.
2. Create N empty groups
3. Start adding the items one at a time into the group that has the smallest sum in it.

I think in most real life situations this should be enough.

Alin Purcaru
  • 43,655
  • 12
  • 77
  • 90
  • 3
    O(NlogN). sorting is the bottleneck, this solution will ensure the difference between two groups is at most max{S} – amit Jul 28 '11 at 08:11
  • 2
    in a different thread, similar to this one, I have **proved** that max{S}-min{S} is the maximum difference for this algorithm. have a look: http://stackoverflow.com/questions/6455703/fair-partitioning-of-set-s-into-k-partitions/6486812#6486812 – amit Jul 28 '11 at 13:03
  • 1
    @amit. What about splitting `[1, 1, 1]` into two chunks? I think max(S) sounds more like the right answer. – Mad Physicist Jan 09 '19 at 08:06
3

This will be faster and a little cleaner (based on above ideas!)

def split_chunks2(l, n):
    result = [[] for i in range(n)]
    sums   = [0]*n
    i = 0
    for e in l:
        result[i].append(e)
        sums[i] += e
        i = sums.index(min(sums)) 
    return result
3

Based on @Alin Purcaru answer and @amit remarks, I wrote code (Python 3.1). It has, as far as I tested, linear performance (both for number of items and number of chunks, so finally it's O(N * M)). I avoid sorting the list every time, keeping current sum of values for every chunk in a dict (can be less practical with greater number of chunks)

import time, random

def split_chunks(l, n):
    """ 
       Splits list l into n chunks with approximately equals sum of values
       see  http://stackoverflow.com/questions/6855394/splitting-list-in-chunks-of-balanced-weight
    """
    result = [[] for i in range(n)]
    sums   = {i:0 for i in range(n)}
    c = 0
    for e in l:
        for i in sums:
            if c == sums[i]:
                result[i].append(e)
                break
        sums[i] += e
        c = min(sums.values())    
    return result


if __name__ == '__main__':

    MIN_VALUE = 0
    MAX_VALUE = 20000000
    ITEMS     = 50000
    CHUNKS    = 256

    l =[random.randint(MIN_VALUE, MAX_VALUE ) for i in range(ITEMS)]

    t = time.time()

    r = split_chunks(l, CHUNKS)

    print(ITEMS, CHUNKS, time.time() - t)

Just because, you know, we can, the same code in PHP 5.3 (2 - 3 times slower than Python 3.1):

function split_chunks($l, $n){

    $result = array_fill(0, $n, array());
    $sums   = array_fill(0, $n, 0);
    $c = 0;
    foreach ($l as $e){
        foreach ($sums as $i=>$sum){
            if ($c == $sum){
                $result[$i][] = $e;
                break;  
            } // if
        } // foreach
        $sums[$i] += $e;        
        $c = min($sums);
    } // foreach
    return $result;
}

define('MIN_VALUE',0);
define('MAX_VALUE',20000000);
define('ITEMS',50000);
define('CHUNKS',128);

$l = array();
for ($i=0; $i<ITEMS; $i++){
    $l[] = rand(MIN_VALUE, MAX_VALUE);  
}

$t = microtime(true);

$r = split_chunks($l, CHUNKS);

$t = microtime(true) - $t;

print(ITEMS. ' ' .  CHUNKS .' ' . $t . ' ');
ts.
  • 10,510
  • 7
  • 47
  • 73
  • in a different thread, similar to this one, I have **proved** that max{S}-min{S} is the maximum difference for this algorithm. have a look: http://stackoverflow.com/questions/6455703/fair-partitioning-of-set-s-into-k-partitions/6486812#6486812 – amit Jul 28 '11 at 13:02
1

you may want to use Artificial Intelligence tools for the problem. first define your problem

States={(c1,c2,...,ck) | c1,...,ck are subgroups of your problem , and union(c1,..,ck)=S } 
successors((c1,...,ck)) = {switch one element from one sub list to another } 
utility(c1,...,ck) = max{sum(c1),sum(c2)...} - min{sum(c1),sum(c2),...}

now, you can use steepest ascent hill climbing with random-restarts.

this algorithm will be anytime, meaning you can start searching, and when time's up - stop it, and you will get the best result so far. the result will be better as run time increased.

amit
  • 175,853
  • 27
  • 231
  • 333
0

Scala version of foxtrotmikew answer:

def workload_balancer(element_list: Seq[(Long, Any)], partitions: Int): Seq[Seq[(Long, Any)]] = {
    val result  = scala.collection.mutable.Seq.fill(partitions)(null : Seq[(Long, Any)])
    val index   = (0 to element_list.size-1)
    val weights = scala.collection.mutable.Seq.fill(partitions)(0l)
    (0 to partitions-1).foreach( x => weights(x) = 0 )

    var i = 0
    for (e <- element_list){
      result(i)  = if(result(i) == null) Seq(e) else result(i) ++: Seq(e)
      weights(i) = weights(i) + e._1
      i          = weights.indexOf( weights.min ) 
    }
    result.toSeq
}

element_list should be (weight : Long, Object : Any), then you can order and split objects into different workloads (result). It help me a lot!, thnks.

IQR
  • 1