2

This may be a repetitive topic but I have a different problem on this topic and the existing answers didn't help me.
I want to generate some random values from the uniform distribution over (0,1) in my python application and normalizing the sum to equal one.
The problem is that the sum of generated values is not exactly equal to one. On each group of normalized random values, the sum for example equals to 0.99999999999 or 1.00000000056. I have tried some tricks but couldn't solve the issue. This is what I have done so far:

sum = 0
probs = []
for i in range(10):
    probs += [random.uniform(0, 1)]

prob = np.array(probs)

for p in prob:
 p /= prob.sum()  # normalize
 sum += p

print(sum)

Does anyone know how to generate these numbers so that their sum equals exactly 1?

samiravz
  • 49
  • 7
  • Check the 4th option [here](https://math.stackexchange.com/questions/278418/normalize-values-to-sum-1-but-keeping-their-weights) – AnkurSaxena Feb 05 '21 at 06:59
  • I may be wrong but I think that either probs is not really random (It is never truely random so this might not be a problem), or you can not guarantee that the sum is exactly one with a finite number of digits behind the decimal point. – joostblack Feb 05 '21 at 07:12
  • @AnkurSaxena I tried that solution. in still gives ```0.999999``` on 30% of different group of numbers. – samiravz Feb 05 '21 at 07:18

1 Answers1

2

The sum is not exactly equal to 1.0, because of floating point errors, not related to numpy or python. But, you can modify last element to account for that error. Like:

probs = np.random.random(10)
probs = probs / probs.sum()
probs[-1] = 1 - probs[:-1].sum()
probs.sum()
>>>
1.0
armamut
  • 1,087
  • 6
  • 14
  • 3
    Even this can't guarantee the result adds up to 1 exactly. Floating point numbers (at least any systems based on fraction times base to the exponent such as IEEE 754) are more closely spaced near 0 than farther away. So if the sum of all but the last is small enough, then 1 - sum will in a less closely spaced range, and it won't be exactly representable. Whether the array `probs` can possibly contain such numbers depends, I guess, on exactly how it is constructed. – Robert Dodier Feb 05 '21 at 07:24
  • I agree to the Robert Dodier's comment. But, summing one by one would always leed you to floating point errors. One strategy would be giving up and agreeing small differences. – armamut Feb 05 '21 at 07:31
  • @armamut Thanks. I came to the same conclusion. I have to ignore that difference – samiravz Feb 05 '21 at 07:37
  • I'm with @RobertDodier, there is NO GUARANTEE sum would be exactly 1 – Severin Pappadeux Feb 05 '21 at 16:02
  • Agreed. This is an enhancing approach but there's no guarantee. – armamut Feb 05 '21 at 16:40
  • Well, it's not quite true that "summing one by one would always lead you to floating point errors". In some cases it does, and in some cases it doesn't. Distinguishing the cases is a very minor problem and mostly I'm just interested in reasoning about how floating point numbers work. – Robert Dodier Feb 05 '21 at 18:06
  • 1
    Yes you are right. This is also true. I didn't want to mislead readers. I misused the word "always". I meant its not guaranteed, but by experience, more often then not you will not get excact sum. Interested readers should learn deep details about floating point numbers. Its indeed a very complicated problem. For example see https://stackoverflow.com/questions/33004029/is-numpy-sum-implemented-in-such-a-way-that-numerical-errors-are-avoided – armamut Feb 05 '21 at 19:24