-1

I'm using a GAM model to predict species abundance based on some environmental conditions at a given point. I have created a generalized additive model (GAM) to do this and base my predictions from. However, I have one categorical variable (sediment type=[1,2,3,4]) in the model equation. The equation seems to work just fine,however the results of the fit seem to absorb the factor level '1' into the intercept. See below.

Can anyone explain what is happening with this model? I do not fully understand. This was run in R with the mgcv package. Thanks!

Equation:            
abundance ~ s(x) + s(y) + s(z) + s(w) + factor(Sediment)
Parametric coefficients:  
Estimate Std. Error z value Pr(>|z|)

(Intercept)   ------------_7.138 ----- 0.000 ------7541.26   2e-16  
        factor(Sediment)2 -0.2496868  0.0016749 -149.08   2e-16  
        factor(Sediment)3 -0.5128687  0.0058931  -87.03   2e-16  
        factor(Sediment)4 -0.1467369  0.0034606  -42.40   2e-16

Approximate significance of smooth terms:  
              _________   _edf Ref.df  Chi.sq p-value    
s(x) 3.983      4   69264  2e-16  
s(y)  3.998      4 1147536  2e-16   
s(z)  3.995      4  197458  2e-16  
s(w)   3.999      4  340085  2e-16
divibisan
  • 11,659
  • 11
  • 40
  • 58
  • 1
    It appears that I can't vote to close this as a duplicate of a question on stats.stackexchange.com. Yet [there](http://stats.stackexchange.com/q/26539/5055) it is. – joran Jun 12 '12 at 17:06

1 Answers1

4

The intercept represents the mean abundance for sediment type 1 as this will be the reference level (the first level). The Estimates are the coefficients for the other levels of sediment type and represent deviations of that type from the reference level (sediment type 1).

This is a standard convention with factor variables in models; if you have an intercept in the model you can't represent it and each level of the factor as the resulting columns of the model matrix will be linearly dependent on each other - you can represent the same information with at least one fewer columns in the model matrix.

If you want, you can drop the intercept by adding - 1 to the formula, but I don't see the reason to do so here.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Adding `-1` to the formula should result in further SO or SE questions, though. – IRTFM Jun 12 '12 at 17:34
  • 1
    @DWin ??? You mean in terms of statistically interpreting such a model? Yes, I agree; With all the identifiability constraints stuff going on the GAM that Simon Wood has worked out, I would just choose to trust him and keep the intercept. – Gavin Simpson Jun 12 '12 at 18:43
  • I thought it would raise further questions about interpretations of contrasts in R. I wasn't thinking about the GAM case in particular. – IRTFM Jun 12 '12 at 18:55