1

I am working with a beta distribution in pymc3 whose parameters I get from scipy. From my understanding, the loc and scale that scipy outputs when running stats.beta.rvs are optional.

In PyMC3's API, the beta distribution just take alpha and beta as parameters. Thus, if I want to fit a beta distribution in scipy to then pass onto the beta distribution in PyMC3, I can use the following formulation fixing the location at 0:

import numpy as np
from scipy.stats import beta 

data = np.array([2001.71169931, 1952.12419181, 2008.46912701, 2133.96174745,
       2035.70369275, 2010.56689658, 2151.40630534, 2026.67386354,
       1973.36879614, 2113.31952901, 1978.31670043, 1990.21473284,
       2095.62905113, 2238.99892624, 2131.04027332, 2059.30645903,
       1947.33063426, 2023.13299349, 2211.05988933])

beta_fit = stats.beta.fit(data)

beta_fit_alt = stats.beta.fit(data, floc=0)

Now to test whether these distributions are similar, I draw random samples from them and take the median (this is just a rough approximation of similarity, but good enough for the purpose):

print(np.median(stats.bit.rvs(beta_fit[0], beta_fit[1], beta_fit[2], beta_fit[3], 1000)))
2040.86
print(np.median(stats.bit.rvs(beta_fit_alt[0], beta_fit_alt[1], beta_fit_alt[2], beta_fit_alt[3], 1000)))
6.68e-29

The results I get are of vastly different scale. I think I have a misunderstanding of the loc and scale parameters. Also, I did try to specify fscale = 0 in the beta_fit_alt object, but it seems that floc and fscale are mutually exclusive arguments.

Finally, I realize that I can do a transform in pymc3 as specified here, but I would like to understand if it's possible to simplify and specify without the loc and scale parameters, since they are optional to defining the distribution.

matsuo_basho
  • 2,833
  • 8
  • 26
  • 47
  • 1
    In addition to the usual α and β, `scipy.stats.beta` has location and scale parameters. To get the "standard" beta distribution (with support in [0, 1], as described in the PyMC3 documentation), you would fix the location to 0 and the scale to 1. But that doesn't make sense for your data, where all the values are close to 2000. Such data cannot come from a standard beta distribution, so to model it with beta, you'll need to allow for some combination of location and scale. – Warren Weckesser Apr 03 '22 at 18:13

0 Answers0