7

I am generating a random number using scipy stats. I used the Poisson distribution. Below is an example:

import scipy.stats as sct

A =2.5
Pos = sct.poisson.rvs(A,size = 20)

When I print Pos, I got the following numbers:

array([1, 3, 2, 3, 1, 2, 1, 2, 2, 3, 6, 0, 0, 4, 0, 1, 1, 3, 1, 5])

You can see from the array that some of the number,such as 6, is generated.

What I want to do it to limit the biggest number(let's say 5), i.e. any random number generated using sct.poisson.rvs should be equal or less than 5,

How can I tweak my code to achieve it. By the way, I am using this in Pandas Dataframe.

Zephyr
  • 1,332
  • 2
  • 13
  • 31
  • 1
    You can't control the random number distribution, unless you manually alter the numbers after you get the array, which is trivial. Otherwise you may want to look into other distributions which are limited, such as beta. – user2974951 Sep 19 '18 at 06:59

2 Answers2

8

I think the solution is quite simple (assuming I understood your issue correctly):

# for repeatability:
import numpy as np
np.random.seed(0)

from scipy.stats import poisson, uniform
sample_size = 20
maxval = 5
mu = 2.5

cutoff = poisson.cdf(maxval, mu)
# generate uniform distribution [0, cutoff):
u = uniform.rvs(scale=cutoff, size=sample_size)
# convert to Poisson:
truncated_poisson = poisson.ppf(u, mu)

Then print(truncated_poisson):

[2. 3. 3. 2. 2. 3. 2. 4. 5. 2. 4. 2. 3. 4. 0. 1. 0. 4. 3. 4.]
AGN Gazer
  • 8,025
  • 2
  • 27
  • 45
  • Dear AGN, Thanks for the advice and sorry for my late reply – Zephyr Sep 24 '18 at 02:51
  • Was wondering why this method gives a similar sequence of random numbers in multiple runs (even without `np.random.seed(0)`? – mdslt Jun 21 '22 at 02:42
  • I commented out the `np.random.seed(0)` line and re-run the _entire code_ in my answer and I got a different sequence. I cannot reproduce your issue. Maybe you could provide a more detailed description of exactly how you are running the code? – AGN Gazer Jun 21 '22 at 03:20
  • 1
    This one should be the accepted answer. It is more efficient, and also uses quantiles and a base measure which will help the programmer understand more math for the future. – Josh Albert Sep 26 '22 at 21:07
3

What you want could be called the truncated Poisson distribution, except that in the common usage of this term, truncation happens from below instead of from above (example). The easiest, even if not always the most efficient, way to sample a truncated distribution is to double the requested array size and keep only the elements that fall in the desired range; if there are not enough, double the size again, etc. As shown below:

import scipy.stats as sct

def truncated_Poisson(mu, max_value, size):
    temp_size = size
    while True:
        temp_size *= 2
        temp = sct.poisson.rvs(mu, size=temp_size)
        truncated = temp[temp <= max_value]
        if len(truncated) >= size:
            return truncated[:size]

mu = 2.5
max_value = 5
print(truncated_Poisson(mu, max_value, 20))

Typical output: [0 1 4 5 0 2 3 2 2 2 5 2 3 3 3 3 4 1 0 3].

  • Dear, Thanks for the advice and sorry for the late reply. I think this function works and suits my application better.because I am using it in dataframe. – Zephyr Sep 24 '18 at 02:52
  • Hi @Welcome to Stack, I was using this function in Pandas data frame and it showed me the following error: ValueError: size does not match the broadcast shape of the parameters. the data frame contain 10 rows and 13 column.I am trying to create a new column which used truncated_Poisson function. How would I do this? Below is the code for new column UCL_Fix_Dub ['Team1_goals'] = truncated_Poisson(UCL_Fix_Dub.Team1_XG,max_goal,1) – Zephyr Sep 24 '18 at 05:02