I have data that follow a Gaussian distribution. However, the data is truly Gaussian only for a range of values [xa,xb] so I want to fit a truncated normal distribution using scipy.stats.truncnorm while using the fact that I know the range [xa,xb]. My goal is to find loc and scale.
I don't understand how to fix xa and xb in the fit. The shape parameters are 'a' and 'b', but those depend on loc and scale, which are my unknowns. Moreover, it doesn't seem to be possible to put an initial guess on 'a' and 'b' (they can only be frozen with fa and fb?). When I do:
par = truncnorm.fit(r, a=a_guess, b=b_guess, scale= scale_guess, loc = loc_guess)
I get
Unknown arguments: {'a': 0.0, 'b': 2.4444444444444446}.
Also, the fits I get are very unstable. Here's a example:
from scipy.stats import truncnorm
import matplotlib.pyplot as plt
xa, xb = 30,250
loc, loc_guess = 50, 30
scale, scale_guess = 75, 90
a,b = (xa-loc)/scale, (xb-loc)/scale
fig, ax = plt.subplots(1, 1)
x = np.linspace(xa,xb,10000)
ax.plot(x, truncnorm.pdf(x, a, b, loc=loc, scale=scale),
'r-', lw=5, alpha=0.6, label='truncnorm pdf')
r = truncnorm.rvs(a, b, loc=loc, scale=scale, size=10000)
par = truncnorm.fit(r, scale= scale_guess, loc = loc_guess)
ax.plot(x, truncnorm.pdf(x, *par),
'b-', lw=1, alpha=0.6, label='truncnorm fit')
ax.hist(r, density=True, histtype='stepfilled', alpha=0.3)
plt.legend()
plt.show()
I also often have this warning:
/home/elie/anaconda2/envs/py36/lib/python3.6/site-packages/scipy/stats/_continuous_distns.py:5823: RuntimeWarning: divide by zero encountered in log self._logdelta = np.log(self._delta)