Split array to approximately equal chunks

Question

How split array into two chunks, when sum of every chunk is approximately equal?

>>> foo([10, 1, 1, 1])
[[10], [1, 1, 1]]
>>> foo([2, 5, 9, 5, 1, 1])
[[2, 5], [9, 5, 1, 1]]
>>> foo([9, 5, 5, 8, 2, 2, 18, 8, 3, 9, 4])
[[9, 5, 5, 8, 2, 2], [18, 8, 3, 9, 4]]
>>> foo([17, 15, 2, 18, 7, 20, 3, 20, 12, 7])
[[17, 15, 2, 18, 7], [20, 3, 20, 12, 7]]
>>> foo([19, 8, 9, 1, 14, 1, 16, 4, 15, 5])
[[19, 8, 9, 1], [14, 1, 16, 4, 15, 5]]

why is the last not split into 2 lists of length 5 or your first two lists not split evenly? — Padraic Cunningham, Feb 19 '15 at 09:10
That's rather vague, a simple way is just to split when it's above the average. — simonzack, Feb 19 '15 at 09:11
@Vladislav Are the elements in your _arrays_ all greater than 0? If they're not your problem is ill-posed, or at least so it seems to me. E.g., for `[1, -1, 1, -1, ..., 1, -1]` every chunk of even length is a possible solution. — gboffi, Feb 19 '15 at 10:15
@gboffi The question does not say that there has to be a unique solution.. — nullstellensatz, Feb 19 '15 at 10:53

Mazdak · Answer 1 · 2015-02-19T09:41:09.117

3

You can create your slicees with loop over your list then choose the proper pairs with min function with a proper key :

>>> def find_min(l):
...     return min(((l[:i],l[i:]) for i in range(len(l))),key=lambda x:abs((sum(x[0])-sum(x[1]))))

Demo :

>>> l=[10, 1, 1, 1]
>>> find_min(l)
([10], [1, 1, 1])
>>> l=[9, 5, 5, 8, 2, 2, 18, 8, 3, 9, 4]
>>> find_min(l)
([9, 5, 5, 8, 2, 2], [18, 8, 3, 9, 4])
>>> l=[19, 8, 9, 1, 14, 1, 16, 4, 15, 5]
>>> find_min(l)
([19, 8, 9, 1, 14], [1, 16, 4, 15, 5])

edited Feb 19 '15 at 09:41

answered Feb 19 '15 at 09:35

Mazdak

105,000
18
159
188

I like to hear about the reason of down vote , till if there is a problem with my answer i correct it also you can aware the community of this reason – Mazdak Feb 19 '15 at 09:44
It seems that one person don't like this question and its answers , because he couldn't figure out the problem by himself !!!! and give down vote to all of answers and also the question :D – Mazdak Feb 19 '15 at 09:48
@IgorHatarist yeah! ;) i think this is a very good question, also i up voted it! – Mazdak Feb 19 '15 at 09:49
@thefourtheye I don't get your mean! – Mazdak Feb 19 '15 at 10:05
`foo([19, 8, 9, 1, 14, 1, 16, 4, 15, 5])` does not match the expected output. – Padraic Cunningham Feb 19 '15 at 11:12
@PadraicCunningham it good because its `True` :`>>> abs(sub(*map(sum,([19, 8, 9, 1], [14,1, 16, 4, 15, 5])))) 18 >>> abs(sub(*map(sum,([19, 8, 9, 1,14], [1, 16, 4, 15, 5])))) 10` NOW if you like remove your down vote! – Mazdak Feb 19 '15 at 12:15
Although the question does not state it explicitly, I would think the natural assumption would be that the optimal split is at the point where the cumulative sum of the list is as close as possible to half the sum of the whole list. Any other position increases the difference between the sums of the two halves. – FuzzyDuck Feb 19 '15 at 13:24
@FuzzyDuck, so what should be the output for `foo([19, 8, 9, 1, 14, 1, 16, 4, 15, 5])`? – Padraic Cunningham Feb 19 '15 at 13:42
@PadraicCunningham, I would say `[[19, 8, 9, 1, 14], [1, 16, 4, 15, 5]]`. Any change in the partitioning point would make the difference in the sum greater than 51 - 41 = 10 which it is with this partition. Though I see your point - the OP's desired answer is different. I assume that he has made a mistake in this case. – FuzzyDuck Feb 19 '15 at 13:47
@FuzzyDuck, have you looked at the output for that input in the question? – Padraic Cunningham Feb 19 '15 at 13:47
@PadraicCunningham, yep, I edited my comment after I checked. I can only assume that the OP made a mistake,as their desired output is not really consistent with the spirit (as I interpreted it, anyway) of "approximately equal". – FuzzyDuck Feb 19 '15 at 13:49

Igor Hatarist · Accepted Answer · 2015-02-19T09:51:54.710

Something like that:

def foo(lst):
    total_sum = sum(lst)
    i = 1
    while sum(lst[:i]) < total_sum / 2:  # iterate over the list slices until we hit the "middle" 
        if sum(lst[:i+1]) >= total_sum / 2:  # also make sure that we won't go further
            break

        i += 1

    return [lst[:i], lst[i:]]

Testing:

[[10], [1, 1, 1]]                         # 10 + 3
[[2, 5], [9, 5, 1, 1]]                    # 7 + 16
[[9, 5, 5, 8, 2, 2], [18, 8, 3, 9, 4]]    # 31 + 42
[[17, 15, 2, 18, 7], [20, 3, 20, 12, 7]]  # 59 + 62
[[19, 8, 9, 1], [14, 1, 16, 4, 15, 5]]    # 37 + 55

insanemainframe · Answer 3 · 2015-02-19T16:13:19.317

from itertools import combinations
from collections import Counter


def most_equal_pairs(seq, n=None):
    seq_mapping = dict(enumerate(seq))

    if len(seq_mapping) < 2:
        raise ValueError()
    if len(seq_mapping) == 2:
        first, second = seq_mapping.values()
        yield [first], [second], abs(first - second)
        return

    ids = set(seq_mapping)

    def get_chunk_by_ids(ids):
        return [seq_mapping[i] for i in ids]

    def get_chunk_sum_by_ids(ids):
        return sum(get_chunk_by_ids(ids))

    pairs = Counter()

    for comb_len in range(1, len(ids) - 1):
        for first_comb in combinations(ids, comb_len):
            second_comb = tuple(ids - set(first_comb))
            first_sum = get_chunk_sum_by_ids(first_comb)
            second_sum = get_chunk_sum_by_ids(second_comb)
            diff = abs(first_sum - second_sum)
            pairs[(first_comb, second_comb)] = -diff

    for (first_comb_ids, second_comb_ids), diff in pairs.most_common(n):
        first_comb = get_chunk_by_ids(first_comb_ids)
        second_comb = get_chunk_by_ids(second_comb_ids)
        yield first_comb, second_comb, abs(diff)


def test(seq):
    pairs = list(most_equal_pairs(seq))
    diff_seq = []

    for first, second, diff in pairs:
        assert abs(sum(first) - sum(second)) == abs(diff)
        diff_seq.append(diff)

    assert tuple(sorted(diff_seq)) == tuple(diff_seq)
    best_pair = pairs[0]
    first, second, diff = best_pair
    return first, second, sum(first), sum(second), diff

result

>>> test([10, 1, 1, 1])
([10], [1, 1, 1], 10, 3, 7)

>>> test([2, 5, 9, 5, 1, 1])
([2, 9, 1], [5, 5, 1], 12, 11, 1)

>>> test([9, 5, 5, 8, 2, 2, 18, 8, 3, 9, 4])
([5, 8, 2, 2, 8, 3, 9], [9, 5, 4, 18], 37, 36, 1)

>>> test([17, 15, 2, 18, 7, 20, 3, 20, 12, 7])
([18, 3, 20, 12, 7], [17, 15, 2, 7, 20], 60, 61, 1)

>>> test([19, 8, 9, 1, 14, 1, 16, 4, 15, 5])
([19, 9, 14, 4], [8, 1, 1, 16, 15, 5], 46, 46, 0)

score 1 · Answer 4 · answered Feb 19 '15 at 13:20

1

Assuming that the optimal split is obtained when the list is partitioned at the point where the cumulative sum of the list is as close as possible to half the sum of the whole list:

import numpy as np

x = [19, 8, 9, 1, 14, 1, 16, 4, 15, 5]
csum = np.cumsum(x)
ix = np.argmin(abs(csum-csum[-1]/2)) + 1
result = [x[:ix], x[ix:]]

Result:

[[19, 8, 9, 1, 14], [1, 16, 4, 15, 5]]

answered Feb 19 '15 at 13:20

FuzzyDuck

1,492
12
14

The generalised way using the same principle for `n` parts : https://stackoverflow.com/a/54024280/517835 – Milind R Jan 03 '19 at 14:31

score 0 · Answer 5 · answered Feb 19 '15 at 10:03

Here is my solution:

def sum(*args):
    total = 0
    if len(args) > 0:
        for i in args:
            for element in i:
                total += element
    return total

def foo(Input):
    size = len(Input)
    checkLeftCross = 0
    previousLeft = 0
    previousRight = 0
    currentLeft = 0
    currentRight = 0
    targetIndex = 0
    for i in range(size):
        currentLeft = sum(Input[0:i])
        currentRight = sum(Input[i:size])
        if currentLeft >= currentRight:
            targetIndex = i
            break
        else:
            previousLeft = currentLeft
            previousRight = currentRight

    diffPrev = previousRight - previousLeft
    diffCurr = currentLeft - currentRight

    if diffPrev > diffCurr:
        return Input[0:targetIndex], Input[targetIndex:size]
    else:
        return Input[0:targetIndex-1], Input[targetIndex-1:size]

def main():
    print foo([2, 5, 9, 5, 1, 1])
    print foo([10,1,1,1])
    print foo([9, 5, 5, 8, 2, 2, 18, 8, 3, 9, 4])
    print foo([17, 15, 2, 18, 7, 20, 3, 20, 12, 7])
    print foo([19, 8, 9, 1, 14, 1, 16, 4, 15, 5])

if __name__ == "__main__":
    main()

Explanation:

I have used a function sum to return sum of all elements of list.
funciton foo to return 2 lists after being split after checking if current split was better or worse than previous split, based on difference between 2 successive sums.

Output:

([2, 5], [9, 5, 1, 1])
([10], [1, 1, 1])
([9, 5, 5, 8, 2, 2], [18, 8, 3, 9, 4])
([17, 15, 2, 18, 7], [20, 3, 20, 12, 7])
([19, 8, 9, 1, 14], [1, 16, 4, 15, 5])

Split array to approximately equal chunks

5 Answers5

Linked