15

I have an array of non-negative values. I want to build an array of values who's sum is 20 so that they are proportional to the first array.

This would be an easy problem, except that I want the proportional array to sum to exactly 20, compensating for any rounding error.

For example, the array

input = [400, 400, 0, 0, 100, 50, 50]

would yield

output = [8, 8, 0, 0, 2, 1, 1]
sum(output) = 20

However, most cases are going to have a lot of rounding errors, like

input = [3, 3, 3, 3, 3, 3, 18]

naively yields

output = [1, 1, 1, 1, 1, 1, 10]
sum(output) = 16  (ouch)

Is there a good way to apportion the output array so that it adds up to 20 every time?

sea-rob
  • 2,275
  • 1
  • 22
  • 22
  • don't understand the question... what do you mean by a "proportional array" – Magnus Apr 26 '13 at 00:45
  • Why use a integral type, not just use a floating point type? – zw324 Apr 26 '13 at 00:46
  • @Magnus an array who's values sum to 20 and are proportional to the values in the first array. There's probably a better way to say it. – sea-rob Apr 26 '13 at 00:47
  • @Ziyao Wei Well, the next thing I'm going to do is encode them as single ascii characters in a string, so I can't use floats. So "integer" values is just a requirement for the solution we need. – sea-rob Apr 26 '13 at 00:48
  • 2
    Is it critical that every non-zero numbers are also non-zero in the solution array or can [100, 100, 50, 50] be resolved as [20, 0, 0, 0] ? This would allow some kind of decreasing sum algorithm. – Frederik.L Apr 26 '13 at 00:56
  • @Frederik I'd prefer that non-zero entries get a non-zero proportion of the total. However, I'm willing to settle. :) (+1 BTW, thank you!) – sea-rob Apr 26 '13 at 00:58
  • 1
    How close to perfectly proportional are you willing to accept? You obviously can't get exact proportions for all arrays without using floating-point. – jwodder Apr 26 '13 at 00:59
  • @jwodder Within 1 of the floating point answer would be ideal. However, I don't have a hard requirement for the precision (as background info, I'm using the string to characterize the data rather than as a way to encode/decode it.) – sea-rob Apr 26 '13 at 01:01
  • @jwodder Also, because I'm just characterizing the data, I don't think it's too important where the "error" lands -- any overs or unders can be assigned arbitrarily. – sea-rob Apr 26 '13 at 01:04
  • 2
    See http://stackoverflow.com/questions/15769948/round-a-python-list-of-numbers-and-maintain-the-sum The same principle will work for integers. – Mark Ransom Apr 26 '13 at 01:09
  • @Mark Ransom Sweet! Thank you! :) – sea-rob Apr 26 '13 at 01:11

5 Answers5

15

There's a very simple answer to this question: I've done it many times. After each assignment into the new array, you reduce the values you're working with as follows:

  1. Call the first array A, and the new, proportional array B (which starts out empty).
  2. Call the sum of A elements T
  3. Call the desired sum S.
  4. For each element of the array (i) do the following:
    a. B[i] = round(A[i] / T * S). (rounding to nearest integer, penny or whatever is required)
    b. T = T - A[i]
    c. S = S - B[i]

That's it! Easy to implement in any programming language or in a spreadsheet.

The solution is optimal in that the resulting array's elements will never be more than 1 away from their ideal, non-rounded values. Let's demonstrate with your example:
T = 36, S = 20. B[1] = round(A[1] / T * S) = 2. (ideally, 1.666....)
T = 33, S = 18. B[2] = round(A[2] / T * S) = 2. (ideally, 1.666....)
T = 30, S = 16. B[3] = round(A[3] / T * S) = 2. (ideally, 1.666....)
T = 27, S = 14. B[4] = round(A[4] / T * S) = 2. (ideally, 1.666....)
T = 24, S = 12. B[5] = round(A[5] / T * S) = 2. (ideally, 1.666....)
T = 21, S = 10. B[6] = round(A[6] / T * S) = 1. (ideally, 1.666....)
T = 18, S = 9.   B[7] = round(A[7] / T * S) = 9. (ideally, 10)

Notice that comparing every value in B with it's ideal value in parentheses, the difference is never more than 1.

It's also interesting to note that rearranging the elements in the array can result in different corresponding values in the resulting array. I've found that arranging the elements in ascending order is best, because it results in the smallest average percentage difference between actual and ideal.

Ken Haley
  • 166
  • 1
  • 3
  • 1
    Interesting. The last calculation will always have A[i] == T and B[i] == S, because that's all that's left in each. Way more elegant than mine. – sea-rob Aug 11 '16 at 23:42
12

Your problem is similar to a proportional representation where you want to share N seats (in your case 20) among parties proportionnaly to the votes they obtain, in your case [3, 3, 3, 3, 3, 3, 18]

There are several methods used in different countries to handle the rounding problem. My code below uses the Hagenbach-Bischoff quota method used in Switzerland, which basically allocates the seats remaining after an integer division by (N+1) to parties which have the highest remainder:

def proportional(nseats,votes):
    """assign n seats proportionaly to votes using Hagenbach-Bischoff quota
    :param nseats: int number of seats to assign
    :param votes: iterable of int or float weighting each party
    :result: list of ints seats allocated to each party
    """
    quota=sum(votes)/(1.+nseats) #force float
    frac=[vote/quota for vote in votes]
    res=[int(f) for f in frac]
    n=nseats-sum(res) #number of seats remaining to allocate
    if n==0: return res #done
    if n<0: return [min(x,nseats) for x in res] # see siamii's comment
    #give the remaining seats to the n parties with the largest remainder
    remainders=[ai-bi for ai,bi in zip(frac,res)]
    limit=sorted(remainders,reverse=True)[n-1]
    #n parties with remainter larger than limit get an extra seat
    for i,r in enumerate(remainders):
        if r>=limit:
            res[i]+=1
            n-=1 # attempt to handle perfect equality
            if n==0: return res #done
    raise #should never happen

However this method doesn't always give the same number of seats to parties with perfect equality as in your case:

proportional(20,[3, 3, 3, 3, 3, 3, 18])
[2,2,2,2,1,1,10]
jwueller
  • 30,582
  • 4
  • 66
  • 70
Dr. Goulu
  • 580
  • 7
  • 21
  • 2
    +1 For the blog post http://www.drgoulu.com/2013/12/02/repartition-proportionnelle/ – yadutaf Dec 02 '13 at 20:03
  • right ... added a line to handle this : if n<0: return [min(x,nseats) for x in res] – Dr. Goulu Mar 14 '14 at 10:27
  • I've found that the code in this answer fairly often results in an off-by-one error between the sum of the seats for each party and the number of seats to be allocated. Anyone else who's looking to use an algorithm like this will, depending on their needs, be better-served using another implementation or algorithm, e.g. the D'Hondt method code published here: https://github.com/rg3/dhondt/blob/master/dhondt – tonycpsu Mar 06 '21 at 05:15
2

You have set 3 incompatible requirements. An integer-valued array proportional to [1,1,1] cannot be made to sum to exactly 20. You must choose to break one of the "sum to exactly 20", "proportional to input", and "integer values" requirements.

If you choose to break the requirement for integer values, then use floating point or rational numbers. If you choose to break the exact sum requirement, then you've already solved the problem. Choosing to break proportionality is a little trickier. One approach you might take is to figure out how far off your sum is, and then distribute corrections randomly through the output array. For example, if your input is:

[1, 1, 1]

then you could first make it sum as well as possible while still being proportional:

[7, 7, 7]

and since 20 - (7+7+7) = -1, choose one element to decrement at random:

[7, 6, 7]

If the error was 4, you would choose four elements to increment.

Neil Forrester
  • 5,101
  • 29
  • 32
  • 1
    Thank you! Excellent point... I should have said "roughly proportional", or to answer jwodder's comment above "within 1 of the proportional value" – sea-rob Apr 26 '13 at 01:07
1

A naïve solution that doesn't perform well, but will provide the right result...

Write an iterator that given an array with eight integers (candidate) and the input array, output the index of the element that is farthest away from being proportional to the others (pseudocode):

function next_index(candidate, input)
    // Calculate weights
    for i in 1 .. 8
        w[i] = candidate[i] / input[i]
    end for
    // find the smallest weight
    min = 0
    min_index = 0
    for i in 1 .. 8
        if w[i] < min then
            min = w[i]
            min_index = i
        end if
    end for

    return min_index
 end function

Then just do this

result = [0, 0, 0, 0, 0, 0, 0, 0]
result[next_index(result, input)]++ for 1 .. 20

If there is no optimal solution, it'll skew towards the beginning of the array.

Using the approach above, you can reduce the number of iterations by rounding down (as you did in your example) and then just use the approach above to add what has been left out due to rounding errors:

result = <<approach using rounding down>>
while sum(result) < 20
    result[next_index(result, input)]++
mzedeler
  • 4,177
  • 4
  • 28
  • 41
  • There is a problem getting started above - my suggestion will always start out by filling up the array with ones. This can be avoided by adding in descending order according to ``input``. – mzedeler Apr 26 '13 at 01:10
  • Thank you! I'll work through this tonight with some examples and see how it looks. – sea-rob Apr 26 '13 at 01:11
0

So the answers and comments above were helpful... particularly the decreasing sum comment from @Frederik.

The solution I came up with takes advantage of the fact that for an input array v, sum(v_i * 20) is divisible by sum(v). So for each value in v, I mulitply by 20 and divide by the sum. I keep the quotient, and accumulate the remainder. Whenever the accumulator is greater than sum(v), I add one to the value. That way I'm guaranteed that all the remainders get rolled into the results.

Is that legible? Here's the implementation in Python:

def proportion(values, total):
    # set up by getting the sum of the values and starting
    # with an empty result list and accumulator
    sum_values = sum(values)
    new_values = []
    acc = 0

    for v in values:
        # for each value, find quotient and remainder
        q, r = divmod(v * total, sum_values)

        if acc + r < sum_values:
            # if the accumlator plus remainder is too small, just add and move on
            acc += r
        else:
            # we've accumulated enough to go over sum(values), so add 1 to result
            if acc > r:
                # add to previous
                new_values[-1] += 1
            else:
                # add to current
                q += 1
            acc -= sum_values - r

        # save the new value
        new_values.append(q)

    # accumulator is guaranteed to be zero at the end
    print new_values, sum_values, acc

    return new_values

(I added an enhancement that if the accumulator > remainder, I increment the previous value instead of the current value)

sea-rob
  • 2,275
  • 1
  • 22
  • 22