3

I have a snippet of code that takes a large data set and turns it into proportional smaller data set let me explain:

Lets say you have 20 blue marbles and 10 red marbles, if I wanted to represent this data with 3 marbles I would use 2 blue and 1 red marble.

I don't mind if its not exact for example representing 17 blue and 16 red with 4 marbles. The closest way to proportionally represent it would be with 2 blue and 2 red and that's okay

This is my code in python :

from random import randrange

data_set = [randrange(100, 1000) for x in range(5)]
required_amount = 20
special_number = required_amount / sum(data_set)
proportional_data_set = [round(x * special_number) for x in data_set]

print(data_set)
print(required_amount)
print(proportional_data_set)
print(sum(proportional_data_set))

The problem is that I say the required sample is 20 but sometimes the proportional data set will give me 21 or 19 total. I'm assuming it's because of some rounding error but does anyone know a better way to solve this problem?

A sample output working correctly would be :

[832, 325, 415, 385, 745]
20
[6, 2, 3, 3, 6]
20

A sample working incorrectly would be :

[414, 918, 860, 978, 438]
20
[2, 5, 5, 5, 2]
19

If anyone knows of any similar methods that do something like this that would also be great.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
  • 1
    possible duplicate of [Allocate an array of integers proportionally compensating for rounding errors](http://stackoverflow.com/questions/16226991/allocate-an-array-of-integers-proportionally-compensating-for-rounding-errors) – tzaman Oct 31 '14 at 00:48
  • Thanks for the resource. Although it doesn't quite answer my question. – ThatOneGuyInXNA Oct 31 '14 at 01:31

2 Answers2

3

Here's one way to solve the problem. Calculate special_number as the number of units in the data_set per "marble". Then use divmod() to calculate both the proportional amount and the remainder. Since divmod() returns an integer quotient, sum(proportional_data_set) would be less than required_amount in most cases.

Finally, use a loop to find the highest remainder and increment the proportional_data_set until sum(proportional_data_set) = required_amount.

from random import randrange

data_set = [randrange(100, 1000) for x in range(5)]
required_amount = 20
special_number = sum(data_set) // required_amount

print("Data set:")
print(data_set)
print("Special number:")
print(special_number)

# divmod() returns a pair of numbers, split them into quotients and remainders
pairs = [divmod(x, special_number) for x in data_set]
proportional_data_set = [x[0] for x in pairs]
remainder = [x[1] for x in pairs]

print
print("Proportional data set before adjusting:")
print(proportional_data_set), "=", sum(proportional_data_set)
print("Remainders:")
print(remainder)

while sum(proportional_data_set) < required_amount:
    i = remainder.index(max(remainder))    # index of the highest remainder
    proportional_data_set[i] += 1          # add another marble to this index
    remainder[i] = -1                      # don't use this remainder again

print
print("Proportional data set after adjusting:")
print(proportional_data_set), "=", sum(proportional_data_set)
print("Remainders:")
print(remainder)

The output looks like:

Data set:
[546, 895, 257, 226, 975]
Special number:
144

Proportional data set before adjusting:
[3, 6, 1, 1, 6] = 17
Remainders:
[114, 31, 113, 82, 111]

Proportional data set after adjusting:
[4, 6, 2, 1, 7] = 20
Remainders:
[-1, 31, -1, 82, -1]

The highest remainders were used to increment the proportional data set and then set to -1.

Brent Washburne
  • 12,904
  • 4
  • 60
  • 82
  • Thanks but lets say that the proportional data set added up to more than the required number instead of less I dont think this would subtract from it after adjusting – ThatOneGuyInXNA Oct 31 '14 at 22:25
  • The Hagenbach-Bischoff quota method solves this problem by initially dividing by (N+1), which guarantees that one only has to add to the initial allocation. The same principle could be applied to this method, I think. – Simon Oct 31 '14 at 22:31
  • The proportional data set starts with values rounded down, with the fractional part in the remainder. There is no way the sum of the proportional data set could be more than the required number. There is no need to use (N+1), either. – Brent Washburne Oct 31 '14 at 23:12
  • @BrentWashburne: Ah yes, I see the initial rounding down now. – Simon Nov 01 '14 at 00:52
2

I was going to offer a solution based on a Bresenham line between the cumulative sum of the input data and the cumulative sum of the proportional output values but (a) it turns out to give the wrong answer - see below - and (b) I believe @tzaman's pointer to Allocate an array of integers proportionally compensating for rounding errors provides a simpler solution than any correction that I could make to the Bresenham method (the proportional() function is by @Dr. Goulu):

def proportional(nseats,votes):
    """assign n seats proportionaly to votes using Hagenbach-Bischoff quota
    :param nseats: int number of seats to assign
    :param votes: iterable of int or float weighting each party
    :result: list of ints seats allocated to each party
    """
    quota=sum(votes)/(1.+nseats) #force float
    frac=[vote/quota for vote in votes]
    res=[int(f) for f in frac]
    n=nseats-sum(res) #number of seats remaining to allocate
    if n==0: return res #done
    if n<0: return [min(x,nseats) for x in res] # see siamii's comment
    #give the remaining seats to the n parties with the largest remainder
    remainders=[ai-bi for ai,bi in zip(frac,res)]
    limit=sorted(remainders,reverse=True)[n-1]
    #n parties with remainter larger than limit get an extra seat
    for i,r in enumerate(remainders):
        if r>=limit:
            res[i]+=1
            n-=1 # attempt to handle perfect equality
            if n==0: return res #done
    raise #should never happen

print (proportional(20,[832, 325, 415, 385, 745]))
print (proportional(20,[414, 918, 860, 978, 438]))

... gives the output:

[6, 2, 3, 3, 6]
[2, 5, 5, 6, 2]

... as required.

Bresenham line (non-)solution

For those who may be interested in the Bresenham line (non-)solution, here it is, based on the code here:

import itertools, operator

def bresenhamLine(x0, y0, x1, y1):
    dx = abs(x1 - x0)
    dy = abs(y1 - y0)
    sx = x0 < x1 and 1 or -1
    sy = y0 < y1 and 1 or -1
    err = dx - dy
    points = []
    x, y = x0, y0
    while True:
        points += [(x, y)]
        if x == x1 and y == y1:
            break
        e2 = err * 2
        if e2 > -dy:
            err -= dy
            x += sx
        if e2 < dx:
            err += dx
            y += sy
    return points

def proportional(n,inp):
    cumsum = list(itertools.accumulate(inp))
    pts = bresenhamLine(0,0,max(cumsum),n)
    yval = [y for x,y in pts]
    cumsum2 = [yval[x] for x in cumsum]
    res = [cumsum2[0]]
    for i,x in enumerate(cumsum2[1:]):
        res.append(x-cumsum2[i])
    return res

print (proportional(20,[832, 325, 415, 385, 745]))
print (proportional(20,[414, 918, 860, 978, 438]))

... however the output is

[6, 3, 3, 2, 6]
[2, 5, 5, 6, 2]

... which is incorrect because for the second to fourth items in the first list, it assigns "2" to the middle-ranked item rather than the lowest-ranked item. The Hagenbach-Bischoff quota method gets this allocation correct.

Community
  • 1
  • 1
Simon
  • 10,679
  • 1
  • 30
  • 44