0

I defined the following custom probability distribution:

import scipy.stats as st

# parameters
a = 3 / 16
b = 1

class linear_fractional(st.rv_discrete):
    def _pdf(self, n):
        if (n == 0):
            return (a + b - 1) / (a + b)
        else:
            return (a * b ** (n - 1)) / (a + b) ** (n + 1)

LF = linear_fractional()
LF.rvs()

When I let my script run I get a lengthy error message:

Traceback (most recent call last):
File "C:/Users/thoma/PycharmProjects/Host_Parasite_Coevolution/Asymptotics.py", line 17, in <module> LF.rvs()
File "C:\Users\thoma\AppData\Local\Programs\Python\Python37-32\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 2969, in rvs
    return super(rv_discrete, self).rvs(*args, **kwargs)

...

RecursionError: maximum recursion depth exceeded while calling a Python object

If I do LF.mean() instead, I get

Fatal Python error: Cannot recover from stack overflow.

Does anyone know why that is and how I could solve this issue? Do I have to define an upper bound on my probability distribution?

JohanC
  • 71,591
  • 8
  • 33
  • 66
MMM
  • 373
  • 1
  • 4
  • 12

1 Answers1

2

Following the examples given at the docs and this post, the approach needs some modification. Importantly, as is it a discrete distribution, _pmf should be used instead of _pdf. Also, the _pmf will get called by numpy-style arrays for n, for which n == 0 works quite differently.

As (a * b ** (n - 1)) / (a + b) ** (n + 1) is equal to (a + b - 1) / (a + b) when n == 0, we can just use that first expression for all n. However, numpy generates an error when b is an integer and n = -1. Multiplying b with 1.0 changes it to a float for which numpy doesn't give such error. If the same parameters a and b are used multiple times, a frozen distribution could be generated.

Here is an example, which creates a histogram of the generated samples, and compares it with the pmf.

import scipy.stats as st
import numpy as np
from matplotlib import pyplot as plt

class linear_fractional(st.rv_discrete):
    def _pmf(self, n, a, b):
        return (a * (1.0 * b) ** (n - 1)) / (a + b) ** (n + 1)

# parameters
a = 3 / 16
b = 1

LF = linear_fractional()

N = 10000
plt.hist(LF.rvs(a, b, size=N), bins=np.arange(-0.5, 50), ec='w', label='histogram of samples')
plt.plot(LF.pmf(np.arange(50), a, b) * N, 'ro', label='probability mass function (scaled)')
plt.legend(title=f'$a={a}; b={b}$')
plt.autoscale(enable=True, axis='x', tight=True)
plt.show()

resulting histogram

LF.mean(a, b) outputs 5.33333333333286

A scatter plot is an alternative way to illustrate samples from a distribution:

plt.scatter(np.random.uniform(0, 1, N), LF.rvs(a, b, size=N), marker=',', alpha=0.2, lw=0, s=1, color='crimson')

scatter plot

PS: When b=1, the formula for this distribution is equal to the geometric distribution with p = a/(a+1) and subtracting 1. This is much faster as it is completely calculated inside numpy.

samples = np.random.geometric(a/(a+1), size=1000) - 1
JohanC
  • 71,591
  • 8
  • 33
  • 66