SciPy: generating custom random variable from PMF

Question

I'm trying to generate random variables according to a certain ugly distribution, in Python. I have an explicit expression for the PMF, but it involves some products which makes it unpleasant to obtain and invert the CDF (see below code for explicit form of PMF).

In essence, I'm trying to define a random variable in Python by its PMF and then have built-in code do the hard work of sampling from the distribution. I know how to do this if the support of the RV is finite, but here the support is countably infinite.

The code I am currently trying to run as per @askewchan's advice below is:

import scipy as sp
import numpy as np

class x_gen(sp.stats.rv_discrete):
    def _pmf(self,k,param):
        num = np.arange(1+param, k+param, 1)
        denom = np.arange(3+2*param, k+3+2*param, 1)

        p = (2+param)*(np.prod(num)/np.prod(denom))

        return p

pa_limit = limitrv_gen()
print pa_limit.rvs(alpha,n=1)

However, this returns the error while running:

File "limiting_sim.py", line 42, in _pmf
    num = np.arange(1+param, k+param, 1)
TypeError: only length-1 arrays can be converted to Python scalars

Basically, it seems that the np.arange() list isn't working somehow inside the def _pmf() function. I'm at a loss to see why. Can anyone enlighten me here and/or point out a fix?

EDIT 1: cleared up some questions by askewchan, edits reflected above.

EDIT 2: askewchan suggested an interesting approximation using the factorial function, but I'm looking more for an exact solution such as the one that I'm trying to get work with np.arange.

What do you mean by "take" in "the RV takes only finitely many values, but the distribution I have can take arbitrarily large positive values"? Do you mean that the pmf has indefinite number of parameters? Or do you mean that elements in the population have no upper bound? — askewchan, Apr 24 '14 at 18:13
@askewchan: the random variable takes arbitrarily large values, the pmf has one parameter only. In other words if X is the random variable, then P(X=k) > 0 for arbitrarily large k. — gogurt, Apr 24 '14 at 18:22
If I understand you correctly, then I think that simply means that `b = inf` which is actually the default behavior for `rv_discrete`. I assume that the integral of your pmf is finite (and 1). — askewchan, Apr 24 '14 at 18:25
@askewchan: Thanks. I guess what I'm asking for is how exactly to supply the pmf to the rv_discrete method. I know how to do this if I just have a finite list of values and probabilities, but am not exactly sure how to make it work for just an arbitrary function. Sorry if this is too elementary... — gogurt, Apr 24 '14 at 18:39
You get that error because `pmf` must be able to accept an array for `k` and return an array of the same shape for `p`. If you need help doing that I'll edit my answer. — askewchan, Apr 24 '14 at 20:09
@askewchan: Hmm. I'm not sure I quite follow. What sort of array are we talking about here? Theoretically, k can be anything from 0 to arbitrarily large, so it's not clear what to do here. I'd love it if you could go into a little more depth! — gogurt, Apr 24 '14 at 20:20

askewchan · Answer 1 · 2014-04-24T21:10:06.443

4

You should be able to subclass rv_discrete like so:

class mydist_gen(rv_discrete):
    def _pmf(self, n, param):
        return yourpmf(n, param)

Then you can create a distribution instance with:

mydist = mydist_gen()

And generate samples with:

mydist.rvs(param, size=1000)

Or you can then create a frozen distribution object with:

mydistp = mydist(param)

And finally generate samples with:

mydistp.rvs(1000)

With your example, this should work, since factorial automatically broadcasts. But, it might fail for large enough alpha:

import scipy as sp
import numpy as np
from scipy.misc import factorial

class limitrv_gen(sp.stats.rv_discrete):
    def _pmf(self, k, alpha):
        #num = np.prod(np.arange(1+alpha, k+alpha))
        num = factorial(k+alpha-1) / factorial(alpha)
        #denom = np.prod(np.arange(3+2*alpha, k+3+2*alpha))
        denom = factorial(k + 2 + 2*alpha) / factorial(2 + 2*alpha)

        return (2+alpha) * num / denom

pa_limit = limitrv_gen()
alpha = 100
pa_limit.rvs(alpha, size=10)

edited Apr 24 '14 at 21:10

answered Apr 24 '14 at 18:41

askewchan

45,161
17
118
134

This definitely seems like it should work. But there's another problem that popped up. I'll add it to the main body but I upvoted anyways. Thanks. – gogurt Apr 24 '14 at 19:21
Maybe if you post the form of the pmf I or others could test possible solutions. – askewchan Apr 24 '14 at 19:32
Hmm. I'm still getting the exact same error about "only length-1 arrays..." If it's running OK for you though, then I'm probably just doing something wrong on my end. I'll think about it. – gogurt Apr 24 '14 at 21:25
Strange, that error is almost certainly from `np.arange`, which I haven't used in my solution... what value are you using for `alpha` (and did I assume correctly that `alpha` is your single `param`?) Sometimes I get an error for large values in the factorial, but not for the length-1 arrays thing. Are you sure that's not from `np.arange`? – askewchan Apr 24 '14 at 21:38
So, I finally got it to work. Thanks so much for your help. But I'm realizing now that the factorial approximation is too rough for what I need this function to do. Sigh. If possible, I'd still like to figure out a way to do this using explicit multiplication. Still, this is a great answer though. – gogurt Apr 24 '14 at 23:42
1

use `scipy.special.gammaln` instead of the `factorial`, use sums and exp at the end, and the precision will be much higher. – Josef Apr 25 '14 at 02:43

SciPy: generating custom random variable from PMF

1 Answers1

Linked