1

I have been looking into the gamlss package for fitting semiparametric models and came across something strange in the ga() function. Even if the model is specified as having a gamma distribution, fitted using REML, the output for the model is Gaussian, fitted using GCV.

Example::

library(mgcv)
library(gamlss)
library(gamlss.add)
data(rent)
ga3 <- gam(R~s(Fl)+s(A), method="REML", data=rent, family=Gamma(log))
gn3 <- gamlss(R~ga(~s(Fl)+s(A), method="REML"), data=rent, family=GA)

Model summary for the GAM::

summary(ga3)
Family: Gamma 
Link function: log 

Formula:
R ~ s(Fl) + s(A)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 6.667996   0.008646   771.2   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
        edf Ref.df      F p-value    
s(Fl) 1.263  1.482 442.53  <2e-16 ***
s(A)  4.051  4.814  36.34  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.302   Deviance explained = 28.8%
-REML =  13979  Scale est. = 0.1472    n = 1969

Model summary for the GAMLSS::

summary(getSmo(gn3))
Family: gaussian 
Link function: identity 

Formula:
Y.var ~ s(Fl) + s(A)

Parametric coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.306e-13  8.646e-03       0        1

Approximate significance of smooth terms:
        edf Ref.df      F p-value    
s(Fl) 1.269  1.492 440.14  <2e-16 ***
s(A)  3.747  4.469  38.83  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.294   Deviance explained = 29.6%
GCV = 0.97441  Scale est. = 0.97144   n = 1969

Question::

Why is the model output giving the incorrect distribution and fitting method? Is there something that I am missing here and this is correct?

Constantin
  • 132
  • 9
  • Hi Ben, I included an example from the help section for ?ga(). It is using the rent data in the package for mgcv. I just included it in my question. I can try simulate data and include that if you think there would be a further benefit? – Constantin Apr 13 '22 at 21:19
  • No, this looks fine. Can you explain, for those unfamiliar with this framework, what `getSmo()` does/what it's for? I note that `summary(gn3)` by itself *does* say it's using a Gamma-family response ... – Ben Bolker Apr 13 '22 at 21:23
  • I am fairly new to this myself but it seems that getSmo() is used for extracting the information for the fitted smoothers. If you run summary(gn3) only includes parameter values for the mu and sigma intercept though, it doesn't include the estimates for the covariates. – Constantin Apr 13 '22 at 21:30
  • 1
    A wild guess is that the smooth-term part of the model is essentially fitting a Gaussian model on some latent space (i.e, the random effects/latent variables are assumed to be multivariate Gaussian on the link scale?) But you'd have to know more of the theory of how GAMLSS operates than I do in order to evaluate that guess ... In some sense the question should be more "what is `getSmo()` doing?" than "what is `ga()` doing?" ... – Ben Bolker Apr 13 '22 at 21:38

2 Answers2

0

When using the ga()-function, gamlss calls in the background the gam()-function from mgcv without specifying the family. As a result, the splines are fitted assuming a normal distribution. Therefore you see when showing the fitted smoothers family: gaussian and link function: identity. Also note that the scale estimate returned when using ga() is related to the normal distribution.

0

Yes, when using the ga()-function, each gamlss iteration calls in the background the gam()-function from mgcv. It uses the correct local working variable and local weights for a gamma distribution on each iteration.

Robert
  • 186
  • 2