5

Today a friend of mine showed me a bug on a website. (Link)

You can see that the sum of the percentages is 100.1%.

49 + 20.7 + 10.9 + 7 + 5.5 + 7 = 100.1%

And I thought: If I was programming and had those 7 (float) numbers, how could I solve this problem?


Assuming all the numbers have one decimal place:

49.0 + 20.7 + 10.9 + 7.0 + 5.5 + 7.0 = 100.1%

I believe this is a rounding half problem, I don't see other explanation for the error.

e.g.:

49.5% + 50.5% = 100%, if we round them up, 50% + 51% = 101%.

But in this case, because we have two numbers, we could use the round to even.

49.5% + 50.5% = 100%, if we round to even, 50% + 50% = 100%.

The numbers are already tainted, since their sum is 100.1%, therefore, at least one number is equal to himself minus 0.5.

The rounding to even doesn't work in this 7 number example, because it wouldn't in the following sum:

49 + 20.65 + 10.85 + 7 + 5.5 + 7 = 100%

49 + 20.6 + 10.8 + 7 + 5.5 + 7 = 99.9%

Is there any easy/quick way to solve this problem? Use a different method of rounding?

Or the only way I can solve this is by creating a function to handle specifically my problem?


Note: I'm not looking for a specific programming language approach. I hope I can solve this problem in different languages (e.g. In both PHP and C++).

Community
  • 1
  • 1
Edu
  • 2,354
  • 5
  • 32
  • 36
  • I don't understand since those numbers do add to 100% so they must not be from the same sample set. – dbarnes Feb 05 '14 at 15:30
  • @dbarnes They don't. Make the sum yourself if you don't trust the sum I wrote. – Edu Feb 05 '14 at 23:20
  • 1
    Since the numerical error was introduced by missing precision due to truncated string representations of numbers (in whichever language they used, presumably PHP), how would you solve this without actually using a specific language? We can't magically recover the missing precision, which is what most of the answers are telling you. – smci Apr 30 '14 at 09:57

3 Answers3

3

First, your analysis about the probable cause is not the actual cause here. But the concept you're talking about is called bias in rounding schemes. It does exist and is real - it just isn't the cause of your specific issue here.

In the example you claim is causing this: 49.5% + 50.5% = 100%, rounded up, 50% + 51% = 101%. Rounding up (rounding-towards-positive-infinity) is equivalent to round-away-from-zero for positive numbers. See the list of rounding-schemes below 1. But again, that's pretty unlikely to be the cause here, unless you happened to get two adjacent numbers which were identically equal to a.5 and b.5 Not a.7 + b.8 + c.5, or any other combination of digits. To prove why this is not the actual cause, in this list of n numbers, there are (n-1) adjacent pairs and if we make the reasonable assumption that every last digit is equally probable, then the chance of getting adjacent digits a.5, b.5 is only (0.1)^2 = 0.01

Anyway the real cause here is the numerical error introduced by missing precision (due to truncated representations of numbers converted to strings '%2.1f') (in whichever language they used, presumably PHP, Javascript or Java)...

The usual and simplest solution is to just carry more precision. Strictly you might only need one (or two) digits here, but IEEE 754 floats give you 23 digits of mantissa for free, so everyone uses that.

However if you really insist on setting yourself the (artificial) challenge of rounding numbers with missing precision and under the constraint that they must sum to 100.0% (or maximize the chance that they do), there are several lesser-used rounding schemes. You can find these in textbooks and they aren't used much in the real world for obvious reasons, because they introduce randomness and possibly nondeterminism (although you could set the random seed, to at least ensure reproducibility).

So for whatever it's worth here are those rounding schemes (and many others, see the entire article):

[2] http://en.wikipedia.org/wiki/Rounding#Tie-breaking

The following all result in bias for the q=.5 case, and you said you want to avoid using them at all (instead of carrying extra precision, which makes the issue go away):

  • Round half up
  • Round half down
  • Round half away from zero
  • Round half towards zero
  • Round half to even
  • Round half to odd

Now here are the ones of interest to you:

  • Stochastic rounding: Another unbiased tie-breaking method is stochastic rounding:

If the fractional part of y is .5, choose q randomly among y + 0.5 and y − 0.5, with equal probability. Advantages: essentially free of overall bias; but it is also 'fair' among even and odd q values. On the other hand, it introduces a random component into the result; performing the same computation twice on the same data may yield two different results. Also, it is open to nonconscious bias if humans (rather than computers or devices of chance) are "randomly" deciding in which direction to round.

  • Alternating tie-breaking: One method, more obscure than most, is round half alternatingly.

If the fractional part is 0.5, alternate round up and round down: for the first occurrence of a 0.5 fractional part, round up; for the second occurrence, round down; so on so forth. This suppresses the random component of the result, if occurrences of 0.5 fractional parts can be effectively numbered. But it can still introduce a positive or negative bias according to the direction of rounding assigned to the first occurrence, if the total number of occurrences is odd.

If you want to read all about this stuff (computer arithmetic, and the hardware circuits that implement it), one good reference (that goes heavy on the hardware side) is

Computer Arithmetic Algorithms, 2nd Edition by Israel Koren www.ecs.umass.edu/ece/koren/arith/‎ University of Massachusetts Amherst, 2010

smci
  • 32,567
  • 20
  • 113
  • 146
  • About the first part, I already know the error is due to rounding. About the suggestions, you tal about .5, but what if we have the sum `33.7+33.7+32.6`? That wouldn't work. The question was just to know if there was a easy way to round, after all this time I think the only way must be really just round up the highest and down the lowest. – Edu Apr 30 '14 at 23:05
  • 1
    Yes, that's exactly what I said to you: There is no magical rounding-scheme that will infer your sum 100.1 is 0.1 too high and preemptively fix it. If you want to invent a funky new rounding-scheme that explicitly introduces even more bias, so that 0.6 is sometimes rounded down to .0 40% of the time, that's your prerogative. Again, your sensible choices are a) carry more precision (what people generally do, since float precision is essentially free) or b) manually scale/correct/adjust one of the terms to constrain that the sum adds to 100.0% (which is equivalent to rounding after-the-fact). – smci May 01 '14 at 07:00
  • 1
    Edu, you're really talking about a new custom rounding-scheme introducing extra bias towards rounding down when the sum is too high, or towards rounding up when the sum is too low. So you do the sum once with normal rounding (typically round-0.5-upwards), check the aberration between the sum and 100.0, figure out whether you want extra rounding bias down or up, repeat the calculation until you get the 100.0 result. If you ***really*** want to implement this, go ahead and implement it, it's not hard. Honest to God it will be more undesirable than simply carrying extra precision. That's all! – smci May 01 '14 at 07:08
  • I just asked to know if there was a technique I wasn't aware off. Normally extra precision would do the trick, but sometimes you need a integer to save memory space or to occupy a smaller GUI area. – Edu May 01 '14 at 21:22
  • Integers don't save memory, floats fit into 32b just the same, I gave you the link to an SO discussion about that. And as for *'occupy smaller GUI area'*, don't confuse the internal precision with the precision you display on the GUI. Floats have [7.2 significant figures](http://stackoverflow.com/questions/12815179/number-of-significant-digits-for-a-floating-point-type), but you're not obligated to display any or all of them. In this case, you would probably round what you display to 1 or 2 decimal places. But obviously keep full float precision internally. – smci May 02 '14 at 08:13
  • ^ The tl;dr is ***Just display '%2.2f' or else '%2.3f'. Internally, use floats (7.2 significant figures, hence at least 5 decimal places on percentages)*** – smci May 02 '14 at 21:31
1

You should not accumulate rounded values but rather use the (nearly) exact values.

An other dirty way to solve it could be:

if (sum > 100){
    sum = 100;
}

An alternative would be to correct the values by the difference of both sums, as described in the comments (implementation of algorithm is in JavaScript and quick&dirty only for demonstration purposes and also only works when rounding to full integers, otherwise it needs to be modified a bit):

var numbers = [49, 20.7, 10.8, 7, 5.5, 7];
var roundedNumbers = [49, 21, 11, 7, 6, 7];

var sum = numbers.sum();
var roundedSum = roundedNumbers.sum();

while (roundedSum != sum){
    var isRoundeSumLarger = roundedSum > sum;
    var maxDifferenceIndex;
    var maxDifferenceValue = 0;
    for (var n = 0; n < numbers.length; n++){
       var difference = Math.abs(roundedNumbers[n] - numbers[n]);
       if ((isRoundeSumLarger && roundedNumbers[n] > numbers[n] && maxDifferenceValue < difference)
         ||(!isRoundeSumLarger && roundedNumbers[n] < numbers[n] && maxDifferenceValue < difference)){
           maxDifferenceValue = difference;
           maxDifferenceIndex = n;
       }
    }
    var modifyValue = (isRoundeSumLarger ? -1 : 1); // or whatever granularity the rounding algorithm is
    roundedNumbers[maxDifferenceIndex] += modifyValue;
    roundedSum += modifyValue;
}
console.log(numbers, sum); // [49, 20.7, 10.8, 7, 5.5, 7] 100
console.log(roundedNumbers, roundedSum); // [49, 21, 11, 7, 5, 7] 100
Flixer
  • 938
  • 1
  • 7
  • 20
  • Not really the answer I was looking for. You could be right, since rounding both .5, one up and one down (Doesn't matter which) would give the result I want (100%). It would be a rounding error, so that way I can give no rounding error, only a false sum. But rounding the numbers would hide the .5 and present numbers that could be added to 100%. – Edu Feb 05 '14 at 15:34
  • Ok, whats about that: You calculate both sums, the sum with the rounded values and the sum with the exact values. If both are equal, everything is fine. If not, you calculate the difference between two values. If it is positive (rounded sum > exact sum), you have to round down the values of this elements afterwards, which where rounded up and has lowest value after the decimal place... hard to explain, I will try to write a algorithm for it and post it here in the comments – Flixer Feb 05 '14 at 15:43
  • In the last lines of my question I wrote 3 questions precisely to avoid answers with functions. ///////// I have no problem in writing a function that gives the numbers I need. I could do something like: If the sum using ".5 always rounds up" is above 100%, I start rounding down random .5 numbers until I get the 100%. ///////// My question is: Is there a different method of tie-breaking that gives a sum of 100%? In this wikipedia page you can see examples of different tie-breaks: [link](http://en.wikipedia.org/wiki/Rounding#Tie-breaking) That's why I even ask if something like this exists... – Edu Feb 09 '14 at 18:00
  • @Edu, you're really talking about a new custom rounding-scheme introducing extra bias towards rounding down when the sum is too high, or rounding up when the sum is too low. So you do the calculation once with normal rounding (typically round-0.5-upwards), check the aberration between the sum and 100.0, figure out whether you want extra rounding bias down or up, repeat the calculation until you get the 100.0 result. If you ***really*** want to implement this, go ahead and implement it, it's not hard. – smci May 01 '14 at 07:06
0

The 'error' is inherently introduced when calculating the percentages and dropping the accuracy by rounding down/up. Naturally when you sum those rounded numbers you're going to get a different total.

What I think you're suggesting is fudging the rounded figures so that they'll sum an 'original' total which you could do programatically like this for example:

1. sum the percentages = 100.1
2. calculate error: 100 - 100.1 = -0.1
3. select any one of the percentages and negate the error.
4. sum them again, will equal 100.

Of course this means the amended percentage is not longer reflecting the original data (even if by a fractional measure), and there is no correct way to do this.

The reality is that there's no 'problem' with the original maths, only with semantics. I would be inclined to either specify that the figures are rounded, or show the full fractional amount.

Bagofjuice
  • 282
  • 2
  • 13
  • You're right when you say that I will create a rounding error that will spread to the sum. But when we round, we assume that we accept that error. In this case I accept a 0.05 error in order to obtain just one decimal place. And yes, when we get a -0.1 error, we know that we cannot round one of the 0.05 up, but down. And 0.1, the opposite. – Edu Feb 05 '14 at 16:07