I'm having trouble understanding effect coding with glm. As an example:
data('mpg')
mpg$trans = as.factor(mpg$trans)
levels(mpg$trans)
[1] "auto(av)" "auto(l3)" "auto(l4)" "auto(l5)" "auto(l6)" "auto(s4)" "auto(s5)" "auto(s6)" "manual(m5)" "manual(m6)"
For effect (or dummy) coding, glm takes the first level as reference level, so in this case it will be "auto(av)".
mpg_glm = glm(hwy ~ trans, data = mpg, contrasts = list(trans = contr.sum))
summary(mpg_glm)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.4173 0.7318 33.366 < 2e-16 ***
trans1 3.3827 2.3592 1.434 0.153017
trans2 2.5827 3.6210 0.713 0.476426
trans3 -2.4534 0.9157 -2.679 0.007928 **
trans4 -3.6993 1.0865 -3.405 0.000784 ***
trans5 -4.4173 2.1743 -2.032 0.043375 *
trans6 1.2494 2.9866 0.418 0.676105
trans7 0.9160 2.9866 0.307 0.759341
trans8 0.7702 1.4517 0.531 0.596262
trans9 1.8758 0.9845 1.905 0.058011 .
So I'm now thinking that trans1 actually is the second level ("auto(l3)"), because the first one is the reference level. To test this I relevel the factor, so that I will see the coefficient for the first level ("auto(av)"):
mpg$trans <- relevel(mpg$trans, ref="auto(l3)")
levels(mpg$trans)
"auto(l3)" "auto(av)" "auto(l4)" "auto(l5)" "auto(l6)" "auto(s4)" "auto(s5)" "auto(s6)" "manual(m5)" "manual(m6)"
Now I'm expecting to see the coefficient of the first level and the coefficient of the second level is gone (because that is now the reference level):
mpg_glm = glm(hwy ~ trans, data = mpg, contrasts = list(trans = contr.sum))
summary(mpg_glm)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.4173 0.7318 33.366 < 2e-16 ***
trans1 2.5827 3.6210 0.713 0.476426
trans2 3.3827 2.3592 1.434 0.153017
trans3 -2.4534 0.9157 -2.679 0.007928 **
trans4 -3.6993 1.0865 -3.405 0.000784 ***
trans5 -4.4173 2.1743 -2.032 0.043375 *
trans6 1.2494 2.9866 0.418 0.676105
trans7 0.9160 2.9866 0.307 0.759341
trans8 0.7702 1.4517 0.531 0.596262
trans9 1.8758 0.9845 1.905 0.058011 .
I see that the only thing that is changed, is that the first 2 coefficients are switched, so which level is taken as reference?? what am I missing here?