1

My response variable follows a gamma distribution and I'm trying to use gammaRegressor from scikit-learn:

model = GammaRegressor()  
model.fit(X_train_scaled, y_train)
pred_gamma = model.predict(X_test_scaled)

What are the shape and the scale parameters in this case and how does GammaRegressor calculate them?

This the shape and scale of my original data:

from scipy.stats import gamma
shape_k, loc, scale = gamma.fit(y_train)
print(shape_k, loc, scale)
desertnaut
  • 57,590
  • 26
  • 140
  • 166
Malke
  • 11
  • 1

1 Answers1

1

GammaRegressor fits a model such that the target variable is gamma-distributed conditional on a (link function applied to a) linear function of the independent variables, whereas your last code estimates the target variable as gamma-distributed by itself. The predictions out of the regression are the mean of the corresponding gamma distribution, and the shape parameter of the distribution is assumed to be constant (and can be estimated, but as far as I can tell isn't exposed by sklearn).

For more details it becomes off-topic here, but see e.g. these three questions on stats.SE involving R's implementation, and the statsmodels documentation for GLMs.

Ben Reiniger
  • 10,517
  • 3
  • 16
  • 29
  • 1
    Some further links for getting the Gamma distribution parameters from the GLM (mean, dispersion) parameterization https://github.com/statsmodels/statsmodels/issues/7525 https://github.com/statsmodels/statsmodels/blob/main/statsmodels/genmod/families/family.py#L851 https://github.com/statsmodels/statsmodels/issues/106#issuecomment-586072934 – Josef Mar 09 '23 at 02:46