Calculating scale/dispersion of Gamma GLM using statsmodels

Question

I'm having trouble obtaining the dispersion parameter of simulated data using statsmodels' GLM function.

import statsmodels.api as sm
import matplotlib.pyplot as plt 
import scipy.stats as stats 
import numpy as np

np.random.seed(1)

# Generate data
x=np.random.uniform(0, 100,50000)
x2 = sm.add_constant(x)
a = 0.5
b = 0.2
y_true = 1/(a+(b*x))
# Add error 
scale = 2 # the scale parameter I'm trying to obtain
shape = y_true/scale # given that, for Gamma, mu = scale*shape
y = np.random.gamma(shape=shape, scale=scale)

# Run model
model = sm.GLM(y, x2, family=sm.families.Gamma()).fit() 

model.summary()

Here's the summary from above:

Note that the coefficient estimates are correct (0.5 and 0.2), but the scale (21.995) is way off the scale I set (2).

Can someone point out what it is I'm misunderstanding/doing wrong? Thanks!

`scale` in GLM is just pearson_chi2 / df_resid in this case. I don't know how this relates to the usual Gamma parameterization. statmodels GLM does not estimate the scale by maximum likelihood. — Josef, Feb 14 '20 at 01:21
As far as I can figure out the GLM parameterization corresponds to `y = np.random.gamma(shape=1 / scale, scale=y_true * scale)`. — Josef, Feb 14 '20 at 02:43
Also, if you reduce the upper bound of x to 10, then the results look better because it avoids the small values for the mean. — Josef, Feb 14 '20 at 02:44
https://github.com/statsmodels/statsmodels/issues/106#issuecomment-586072934 needs more verification — Josef, Feb 14 '20 at 02:53
Ah! That does seem to resolve the issue. Is this a bug or a documentation issue within statsmodels then? I ask because this doesn't seem to be consistent with either mu = k*theta notation, or with mu=alpha/beta notation (where dispersion would then be the reciprocal of alpha, but here it seems to be the reciprocal of what would be beta). — Anthony S., Feb 18 '20 at 00:53
To be clear, I took your solution to suggest that shape might = beta, scale = alpha (= beta*mu), however – as you note – the dispersion given is 1/beta, rather than 1/alpha. — Anthony S., Feb 18 '20 at 00:55
I was only pattern matching the GLM and GAMMA loglike functions. Statsmodels does not really use this equivalence and only uses the GLM mean-dispersion parameterization, so documenting and supporting reparameterization is still incomplete. — Josef, Feb 18 '20 at 03:20
(a quick search) section 2 in http://pj.freefaculty.org/guides/stat/Regression-GLM/Gamma/GammaGLM-01.pdf confirms this. In the one-parameter family GLM version, we take the Gamma shape parameter as fixed, and the Gamma scale parameter is related to the mean, i.e. expected value. That corresponds to the implementation in statsmodels. — Josef, Feb 18 '20 at 03:24
So provided I have fitted a gamma glm using `statsmodels`, what are then the correct values of shape and scale that I can use for, e.g. `scipy.stats.gamma`? — Willem, Oct 01 '20 at 11:32

score 1 · Accepted Answer · answered Apr 22 '20 at 22:05

1

As Josef noted in the comments, statsmodels uses a different kind of parameterization.

answered Apr 22 '20 at 22:05

Anthony S.

361
1
11

Calculating scale/dispersion of Gamma GLM using statsmodels

1 Answers1

Linked