Why are my categorical variable labeled with random suffix like "L", "Q" and "C"? That's not its level

Question

How to get mgcv::gam to display my actual treatment names in the parametric coefficients intercept names? I just have random letters on them.

Family: quasipoisson 
Link function: log 

Formula:
weekly_eggs ~ food + s(week) + s(week, by = Ofood) + s(id, 
    bs = "re")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -0.5624     0.2031  -2.769  0.00581 **
food.L        0.1011     0.4086   0.247  0.80473   
food.Q        0.5398     0.4076   1.324  0.18594   
food.C        0.2136     0.4053   0.527  0.59838   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                  edf Ref.df      F  p-value    
s(week)         6.403  6.403 49.376  < 2e-16 ***
s(week):Ofoodlow  1.000  1.000  0.001 0.976359    
s(week):Ofoodmed  1.000  1.000 14.360 0.000167 ***
s(week):Ofoodhigh  2.078  2.078  1.656 0.141125    
s(id)    43.669 49.000 12.984  < 2e-16 ***

Confusingly, the smooth terms actually have the treatment names. Before, this problem with the parametric coefficients was fixed by adding an "O" before naming the treatment column, but that seems to not work.

I set all groups to factors before gam, and also created a new column with the O in front of the treatment column which somehow makes it easier to make the summed smooth curves later.

Any advice...? Still unfamiliar with a lot of the gam function. I know this isn't reproducible code, but wasn't sure how to do that with this situation.

Code:

df_P_egg_gamm$id = as.factor(df_P_egg_gamm$id)
df_P_egg_gamm$food = as.factor(df_P_egg_gamm$food)
df_P_egg_gamm$Ofood = factor(df_N_P_egg_gamm$food, ordered=T)

egg_gamm_P = gamm(weekly_eggs ~ food + s(week) + s(week, by=Ofood) + s(id, bs="re"), family="quasipoisson",
                correlation=corCAR1(form=~week|id), data=df_P_egg_gamm)

One tip is that any time you get unexpected output in a statistical model, ask yourself, "Why am I getting this output?" instead of "How do I change the labels to make the output different?" For example, in this case, that you got 3 coefficients for one of the linear terms - which should only have 1 coefficient - should raise a flag that the model is doing something other than expected. More generally, when you include factors as predictors in any model in R, it will almost always either throw an error or it will recode your factor into binary variables for each factor level except the lowest. — socialscientist, Jul 28 '22 at 19:36
Yeah, I suppose I meant to ask why it is and not simply how to change it- I definitely don't want to do poor statistical models. However, I don't understand (again! new at this!) why you are saying I have 3 coefficients for a linear term? To me, the whole point of why I am doing the gamm is because I am not trying to do a linear regression... — lmbradley, Jul 30 '22 at 16:38
There are many reasons to use a GAM and many reasons not to use one, so I'll refrain from discussing whether it is *why* you should (not) use one. However, I'd recommend reviewing the basics of the model, in particular the smoothed versus non-smooth terms. Then check out exactly what `s()` is doing https://www.rdocumentation.org/packages/mgcv/versions/1.8-40/topics/smooth.terms Then consider the different arguments you're using with `s()` and what *not* passing a factor to `s()` does. — socialscientist, Jul 30 '22 at 18:10

Why are my categorical variable labeled with random suffix like "L", "Q" and "C"? That's not its level

0 Answers0