How to avoid roundoff errors in numpy.random.choice?

Question

Say x_1, x_2, ..., x_n are n objects and one wants to pick one of them so that the probability of choosing x_i is proportional to some number u_i. Numpy provides a function for that:

x, u = np.array([x_1, x_2, ..., x_n]), np.array([u_1, ..., u_n])
np.random.choice(x, p = u/np.sum(u))

However, I have observed that this code sometimes throws a ValueError saying "probabilities do not sum to 1.". This is probably due to the round-off errors of finite precision arithmetic. What should one do to make this function work properly?

[similar question](https://stackoverflow.com/q/46539431/18238422) — Pychopath, Feb 25 '22 at 08:57
@Mortz exactly this: "ValueError: probabilities do not sum to 1" — Fırat Kıyak, Mar 03 '22 at 20:18
And does the solution to the question pointed out by @Pychopath help? — Mortz, Mar 08 '22 at 15:57
@Mortz https://stackoverflow.com/a/60386427/6087087 provides a solution. numpy.random.multinomial (https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.multinomial.html) automatically adjusts the last probability to solve the issue but it is noted that this should not be relied upon. Other answers, do not give a satisfactory answer. For example, the accepted solution to that question https://stackoverflow.com/a/46539921/6087087 suggests to normalize the probabilities, which may fail to solve the problem due to roundoff errors. See the comment by pd shah to that answer. — Fırat Kıyak, Mar 08 '22 at 18:56
It all really begs the question why numpy doesn't just do this stuff internally. I mean a key point of numpy is to make it easy to do complex numerical calculations without having to be an expert in IEEE-754 roundoff bs. — Leopd, Sep 14 '22 at 22:25

score 5 · Accepted Answer · answered Mar 08 '22 at 19:13

5

After reading the answer https://stackoverflow.com/a/60386427/6087087 to the question pointed by @Pychopath, I have found the following solution, inspired by the documentation of numpy.random.multinomial https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.multinomial.html

Say p is the array of probabilities which may not be exactly 1 due to roundoff errors, even if we normalized it with p = p/np.sum(p). This is not rare, see the comment by @pd shah at the answer https://stackoverflow.com/a/46539921/6087087.

Just do

p[-1] = 1 - np.sum(p[0:-1])
np.random.choice(x, p = p)

And the problem is solved! The roundoff errors due to subtraction will be much smaller than roundoff errors due to normalization. Moreover, one need not worry about the changes in p, they are of the order of roundoff errors.

answered Mar 08 '22 at 19:13

Fırat Kıyak

480
1
6
18

1

Better to use `p[-1] = max(0, 1 - np.sum(p[0:-1]))` because sometimes roundoff errors will cause that final number to be negative (like -1e-16) which will also `np.random.choice` to fail, but with `ValueError: probabilities are not non-negative` – Leopd Sep 14 '22 at 22:07
Oh well... here's the source code generating that error, but I'm not sure what's the best way to fix the problem, or why numpy hasn't yet fixed it. https://github.com/numpy/numpy/blob/main/numpy/random/mtrand.pyx – Reza Roboubi Jan 15 '23 at 06:38
1

OK, my problem seems to have been resolved with: p = np.array(p, dtype=numpy.float64), i.e. type conversion. I was using a Jax array. My bad. – Reza Roboubi Jan 15 '23 at 06:56

score 0 · Answer 2 · answered Feb 25 '22 at 07:55

0

According to NumPy documentation we have to use p1-D array-like. So i think if u-array is array of probabilities then you can try it:

x, u = np.array([x_1, x_2, ..., x_n]), np.array([u_1, ..., u_n])
np.random.choice(x, p = u)

or

x, u = np.array([x_1, x_2, ..., x_n]), np.array([u_1, ..., u_n])
s = sum(u)
u1 = [i/s for i in u]
np.random.choice(x, p = u1)

answered Feb 25 '22 at 07:55

vovakirdan

345
3
11

This does not answer my question. The second code is almots the same one I posted. I am worried about the cumulative errors occuring due to finite precision arithmetic during divisions. It may lead to sum of probabilities not being equal to (exactly) 1. – Fırat Kıyak Mar 03 '22 at 20:20

How to avoid roundoff errors in numpy.random.choice?

2 Answers2

Linked