4

I don't understand why the below two gam models produce different results. The only difference is in one of the models I added the namespace specifier gam:: before the functions gam and s.

I want to do this because I am exploring the differences between running the gam function in the gam package and in the mgcv package.

library(ISLR)  
library(gam) 
gam.m3 <- gam::gam(wage ~ gam::s(year,4) + gam::s(age,5) + education,data=Wage) 
gam.m3.orig <- gam(wage ~ s(year,4) + s(age,5) + education, data=Wage)

#Coefficients are different
coef(gam.m3)[1]; coef(gam.m3.orig)[1] 

#Models are different
gam.m3$df.residual; gam.m3.orig$df.residual

Here is the output. It seems like the coefficients and degrees of freedom should not be different; in fact the two models should be exactly the same. But, they are different and I don't understand why. Any suggestions welcome, I'm kind of at a loss right now.

> library(ISLR)  
> library(gam) 
Loading required package: splines
Loading required package: foreach
Loaded gam 1.16

> gam.m3 <- gam::gam(wage ~ gam::s(year,4) + gam::s(age,5) + education,        data=Wage) 
Warning message:
In model.matrix.default(mt, mf, contrasts) :
  non-list contrasts argument ignored
> gam.m3.orig <- gam(wage ~ s(year,4) + s(age,5) + education, data=Wage)
Warning message:
In model.matrix.default(mt, mf, contrasts) :
  non-list contrasts argument ignored
> 
> #Coefficients are different
> coef(gam.m3)[1]; coef(gam.m3.orig)[1] 
(Intercept) 
  -2058.077 
(Intercept) 
  -2339.364 
> 
> #Models are different
> gam.m3$df.residual; gam.m3.orig$df.residual
[1] 2993
[1] 2986
TimW
  • 65
  • 5

1 Answers1

3

gam calls gam.fit and gam.fit has specific code for handling smoothers. This code only works correctly, if the model.frame's "terms" attribute has them correctly specified in its "specials" attribute. Otherwise, the smoothers are handled like any other function, which apparently gives a different result. If you want to know how exactly the smoothers are handled differently, you'll need to study the source code of gam.fit in detail.

Basically, this shows the crucial difference between your two calls to gam:

gam.smoothers()$slist
#[1] "s"      "lo"     "random"

attr(terms(wage ~ s(year,4) + s(age,5) + education, 
           specials = gam.smoothers()$slist), "specials")
#$s
#[1] 2 3
#
#$lo
#NULL
#
#$random
#NULL

attr(terms(wage ~  gam::s(year,4) + gam::s(age,5) + education, 
           specials = gam.smoothers()$slist), "specials")
#$s
#NULL
#
#$lo
#NULL
#
#$random
#NULL

Why do you need to use gam::s? Calling gam::gam should be sufficient to ensure that the correct smoother function is called (via the namespace lookup):

gam::gam(wage ~ s(year,4) + s(age,5) + education,data=Wage)  

Edit:

OK, mgcv::s actually masks gam::s on the search path. One approach to solve that problem can be found there.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • As it turns out, since `mgcv` also has `mgcv::s()`, *not* specifying `gam::s()` **also** causes issues, as discussed in [this question](https://stackoverflow.com/q/56417685/8386140). (+1) though for pointing out what problem is caused by using the namespace specifier for `s`. – duckmayr Jun 07 '19 at 11:00
  • @duckmayr I've added an answer there. – Roland Jun 07 '19 at 14:49
  • Great approach there! I'll award the bounty soon, and on the other question I've pinged the OP to point them toward your superior solution. – duckmayr Jun 07 '19 at 15:35
  • @Roland This is great as is your answer to my other question. I've learned a lot from this. Someday I might try to trace back in the code how glm.fit works and why it didn't read `gam::s` the way I wanted it to - but this gives me more than what I need to keep moving. Thanks duckmayr as well for putting the bounty out there! – TimW Jun 07 '19 at 18:07