2

I'm having trouble obtaining the dispersion parameter of simulated data using statsmodels' GLM function.

import statsmodels.api as sm
import matplotlib.pyplot as plt 
import scipy.stats as stats 
import numpy as np

np.random.seed(1)

# Generate data
x=np.random.uniform(0, 100,50000)
x2 = sm.add_constant(x)
a = 0.5
b = 0.2
y_true = 1/(a+(b*x))
# Add error 
scale = 2 # the scale parameter I'm trying to obtain
shape = y_true/scale # given that, for Gamma, mu = scale*shape
y = np.random.gamma(shape=shape, scale=scale)

# Run model
model = sm.GLM(y, x2, family=sm.families.Gamma()).fit() 

model.summary()

Here's the summary from above: glm output

Note that the coefficient estimates are correct (0.5 and 0.2), but the scale (21.995) is way off the scale I set (2).

Can someone point out what it is I'm misunderstanding/doing wrong? Thanks!

Anthony S.
  • 361
  • 1
  • 11
  • `scale` in GLM is just pearson_chi2 / df_resid in this case. I don't know how this relates to the usual Gamma parameterization. statmodels GLM does not estimate the scale by maximum likelihood. – Josef Feb 14 '20 at 01:21
  • 3
    As far as I can figure out the GLM parameterization corresponds to `y = np.random.gamma(shape=1 / scale, scale=y_true * scale)`. – Josef Feb 14 '20 at 02:43
  • 1
    Also, if you reduce the upper bound of x to 10, then the results look better because it avoids the small values for the mean. – Josef Feb 14 '20 at 02:44
  • https://github.com/statsmodels/statsmodels/issues/106#issuecomment-586072934 needs more verification – Josef Feb 14 '20 at 02:53
  • Ah! That does seem to resolve the issue. Is this a bug or a documentation issue within statsmodels then? I ask because this doesn't seem to be consistent with either mu = k*theta notation, or with mu=alpha/beta notation (where dispersion would then be the reciprocal of alpha, but here it seems to be the reciprocal of what would be beta). – Anthony S. Feb 18 '20 at 00:53
  • To be clear, I took your solution to suggest that shape might = beta, scale = alpha (= beta*mu), however – as you note – the dispersion given is 1/beta, rather than 1/alpha. – Anthony S. Feb 18 '20 at 00:55
  • 2
    I was only pattern matching the GLM and GAMMA loglike functions. Statsmodels does not really use this equivalence and only uses the GLM mean-dispersion parameterization, so documenting and supporting reparameterization is still incomplete. – Josef Feb 18 '20 at 03:20
  • 2
    (a quick search) section 2 in http://pj.freefaculty.org/guides/stat/Regression-GLM/Gamma/GammaGLM-01.pdf confirms this. In the one-parameter family GLM version, we take the Gamma shape parameter as fixed, and the Gamma scale parameter is related to the mean, i.e. expected value. That corresponds to the implementation in statsmodels. – Josef Feb 18 '20 at 03:24
  • So provided I have fitted a gamma glm using `statsmodels`, what are then the correct values of shape and scale that I can use for, e.g. `scipy.stats.gamma`? – Willem Oct 01 '20 at 11:32

1 Answers1

1

As Josef noted in the comments, statsmodels uses a different kind of parameterization.

Anthony S.
  • 361
  • 1
  • 11