2

Would somebody be able to explain to me how to use the location parameter with the gamma.fit function in Scipy?

It seems to me that a location parameter (μ) changes the support of the distribution from x ≥ 0 to y = ( x - μ ) ≥ 0. If μ is positive then aren't we losing all the data which doesn't satisfy x - μ ≥ 0?

Thanks!

Christian K.
  • 2,785
  • 18
  • 40
emac
  • 35
  • 1
  • 3

1 Answers1

7

The fit function takes all of the data into consideration when finding a fit. Adding noise to your data will alter the fit parameters and can give a distribution that does not represent the data very well. So we have to be a bit clever when we are using fit.

Below is some code that generates data, y1, with loc=2 and scale=1 using numpy. It also adds noise to the data over the range 0 to 10 to create y2. Fitting y1 yield excellent results, but attempting to fit the noisy y2 is problematic. The noise we added smears out the distribution. However, we can also hold 1 or more parameters constant when fitting the data. In this case we pass floc=2 to the fit, which forces the location to be held at 2 when performing the fit, returning much better results.

from scipy.stats import gamma
import numpy as np
import matplotlib.pyplot as plt

x = np.arange(0,10,.1)
y1 = np.random.gamma(shape=1, scale=1, size=1000) + 2  # sets loc = 2 
y2 = np.hstack((y1, 10*np.random.rand(100)))  # add noise from 0 to 10

# fit the distributions, get the PDF distribution using the parameters
shape1, loc1, scale1 = gamma.fit(y1)
g1 = gamma.pdf(x=x, a=shape1, loc=loc1, scale=scale1)

shape2, loc2, scale2 = gamma.fit(y2)
g2 = gamma.pdf(x=x, a=shape2, loc=loc2, scale=scale2)

# again fit the distribution, but force loc=2
shape3, loc3, scale3 = gamma.fit(y2, floc=2)
g3 = gamma.pdf(x=x, a=shape3, loc=loc3, scale=scale3)

And make some plots...

# plot the distributions and fits.  to lazy to do iteration today
fig, axes = plt.subplots(1, 3, figsize=(13,4))
ax = axes[0]
ax.hist(y1, bins=40, normed=True);
ax.plot(x, g1, 'r-', linewidth=6, alpha=.6)
ax.annotate(s='shape = %.3f\nloc = %.3f\nscale = %.3f' %(shape1, loc1, scale1), xy=(6,.2))
ax.set_title('gamma fit')

ax = axes[1]
ax.hist(y2, bins=40, normed=True);
ax.plot(x, g2, 'r-', linewidth=6, alpha=.6)
ax.annotate(s='shape = %.3f\nloc = %.3f\nscale = %.3f' %(shape2, loc2, scale2), xy=(6,.2))
ax.set_title('gamma fit with noise')

ax = axes[2]
ax.hist(y2, bins=40, normed=True);
ax.plot(x, g3, 'r-', linewidth=6, alpha=.6)
ax.annotate(s='shape = %.3f\nloc = %.3f\nscale = %.3f' %(shape3, loc3, scale3), xy=(6,.2))
ax.set_title('gamma fit w/ noise, location forced')

enter image description here

James
  • 32,991
  • 4
  • 47
  • 70
  • Thanks for the answer James, after some experimentation plus your walkthrough I think I understand it now. Great plots too. – emac Sep 05 '16 at 15:57