26

how can I use np.random.choice here? there is p that calculate by some opertation, like :

 p=[  1.42836755e-01,   1.42836735e-01  , 1.42836735e-01,   1.42836735e-01
,   4.76122449e-05,   1.42836735e-01  , 4.76122449e-05  , 1.42836735e-01,
   1.42836735e-01,   4.76122449e-05]

usually sum p is not exact equal to 1:

>>> sum(p)
1.0000000017347

I want to make random choice by probabilities=p:

>>> np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False)
array([4, 3, 2, 9])

this work here! but in the program it has an error :

Traceback (most recent call last):
    indexs=np.random.choice(range(len(population)), population_number, p=p, replace=False)
  File "mtrand.pyx", line 1141, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:17808)
ValueError: probabilities do not sum to 1

if I print the p:

[  4.17187500e-05   2.49937500e-01   4.16562500e-05   4.16562500e-05
   2.49937500e-01   4.16562500e-05   4.16562500e-05   4.16562500e-05
   2.49937500e-01   2.49937500e-01]

but it works, in python shell by this p:

>>> p=[  4.17187500e-05 ,  2.49937500e-01   ,4.16562500e-05  , 4.16562500e-05,
   2.49937500e-01  , 4.16562500e-05  , 4.16562500e-05  , 4.16562500e-05,
   2.49937500e-01   ,2.49937500e-01]
>>> np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False)
array([ 9, 10,  2,  5])

UPDATE I have tested it by precision=15:

 np.set_printoptions(precision=15)
 print(p)
[  2.499375625000002e-01   2.499375000000000e-01   2.499375000000000e-01
   4.165625000000000e-05   4.165625000000000e-05   4.165625000000000e-05
   4.165625000000000e-05   4.165625000000000e-05   2.499375000000000e-01
   4.165625000000000e-05]

testing:

>>> p=np.array([  2.499375625000002e-01   ,2.499375000000000e-01   ,2.499375000000000e-01,
   4.165625000000000e-05   ,4.165625000000000e-05,   4.165625000000000e-05,
   4.165625000000000e-05  , 4.165625000000000e-05 ,  2.499375000000000e-01,
   4.165625000000000e-05])
>>> np.sum(p)
1.0000000000000002

how fix this to use np.random.choice ?

pd shah
  • 1,346
  • 2
  • 14
  • 26
  • Try printing `[repr(x) for x in p]` and, if `p` is a numpy array, `p.dtype`. Despite the common belief it is not always possible to recreate a sequence of floats just from the output of `print`. – Stop harming Monica Oct 03 '17 at 08:02
  • thx. how can I use np.random.choice here? – pd shah Oct 03 '17 at 08:03
  • It works for me. You need to work harder to create a [mcve]. – Stop harming Monica Oct 03 '17 at 08:22
  • >>> p=np.array([0.1999600079984003, 0.1999600079984003, 0.1999600079984003, 3.9992001599680064e-05, 0.1999600079984003, 3.9992001599680064e-05, 3.9992001599680064e-05, 0.1999600079984003, 3.9992001599680064e-05, 3.9992001599680064e-05]) >>> np.sum(p) 0.99999999999999978 – pd shah Oct 03 '17 at 08:30
  • I do not see why you keep posting examples that **do not** trigger the error. They are not useful to solve your problem. – Stop harming Monica Oct 03 '17 at 08:43

4 Answers4

28

This is a known issue with numpy. The random choice function checks for the sum of the probabilities using a given tolerance (here the source)

The solution is to normalize the probabilities by dividing them by their sum if the sum is close enough to 1

Example:

>>> p=[  1.42836755e-01,   1.42836735e-01  , 1.42836735e-01,   1.42836735e-01
,   4.76122449e-05,   1.42836735e-01  , 4.76122449e-05  , 1.42836735e-01,
   1.42836735e-01,   4.79122449e-05]
>>> sum(p) 
1.0000003017347 # over tolerance limit
>>> np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False)

Traceback (most recent call last):
  File "<pyshell#23>", line 1, in <module>
    np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False)
  File "mtrand.pyx", line 1417, in mtrand.RandomState.choice (numpy\random\mtrand\mtrand.c:15985)
ValueError: probabilities do not sum to 1

With normalization:

>>> p = np.array(p)
>>> p /= p.sum()  # normalize
>>> np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False)
array([8, 4, 1, 6])
user2314737
  • 27,088
  • 20
  • 102
  • 114
  • 7
    thx but dosenot work. ValueError: probabilities do not sum to 1. what to do ? – pd shah Oct 03 '17 at 08:17
  • @pdshah have you tried normalizing the probabilities by `p /= p.sum()`? – user2314737 Oct 03 '17 at 08:25
  • yes: >>> p=np.array([0.1999600079984003, 0.1999600079984003, 0.1999600079984003, 3.9992001599680064e-05, 0.1999600079984003, 3.9992001599680064e-05, 3.9992001599680064e-05, 0.1999600079984003, 3.9992001599680064e-05, 3.9992001599680064e-05]) >>> np.sum(p) 0.99999999999999978 >>> p /= p.sum() >>> np.sum(p) 1.0000000000000002 – pd shah Oct 03 '17 at 08:35
  • @pdshah ok the sum is still not exactly one, but does `np.random.choice` work? – user2314737 Oct 03 '17 at 08:40
  • First thing I thought to do as well. but it did not work – Michael Tamillow Jul 12 '21 at 03:49
  • This may not work due to round-off errors accumulated due to division. See my answer at https://stackoverflow.com/a/71400320/6087087 for a definitive solution. – Fırat Kıyak Mar 08 '22 at 19:54
15

Convert it to float64:

p = np.asarray(p).astype('float64')
p = p / np.sum(p)
np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False)

This was inspired by another post: How can I avoid value errors when using numpy.random.multinomial?

Soid
  • 2,585
  • 1
  • 30
  • 42
  • 1
    IMO this should have more votes. None of the answers worked for my case when my p = [1,1,1], this one did. Thank you! – Jan Pisl Aug 05 '22 at 12:08
5

ValueError: probabilities do not sum to 1

This is a known numpy bug. This error happens when numpy can’t handle float operations precise enough. Sometimes, probabilities will sum to something like 0.9999999999997 or 1.0000000000003. They will break np.random.choice().

There is a workaround: np.random.multinomial(). This method handles probabilities more elegantly without the need to be exactly 1.0.

pvals : sequence of floats, length p Probabilities of each of the p different outcomes. These should sum to 1 (however, the last element is always assumed to account for the remaining probability, as long as sum(pvals[:-1]) <= 1).

For example, I have some choices and normalized_weights associated with the choices.

np.random.multinomial() choose 20 times based on the normalized_weights and returns how many times each choice is chosen.

choices = [......]
weights = np.array([......])
normalized_weights = weights / np.sum(weights)

number_of_choices = 20
resample_counts = np.random.multinomial(number_of_choices,
                                        normalized_weights)

chosen = []
resample_index = 0
for resample_count in resample_counts:
    for _ in range(resample_count):
        chosen.append(choices[resample_index])
    resample_index += 1
Yu N.
  • 1,765
  • 11
  • 9
3

One way to see the difference is:

numpy.set_printoptions(precision=15)
print(p)

This will perhaps show you that your 4.17187500e-05 is actually 4.17187500005e-05. See the manual here.

Ken Y-N
  • 14,644
  • 21
  • 71
  • 114