0

I have a model with factors of the form mod <- lmer(value ~ time + study + (1|participant)) where I am not really interested in the study coefficient but want to keep it as fixed for a various reasons. But I want to make predictions from mod without the effect of the study coefficient. I have sum-coded the study factor so that I really just need a way to remove the study coefficient.

It seems a bit cumbersome to calculate the predictions from the coefficients manually. I guess one alternative is to replace the study coefficient with zeros (Replace lmer coefficients in R)

Any better ideas?

A minimal example:

library("lme4")
df <- data.frame(
  participant = as.factor(c(1,1,1,2,2,2,3,3,3,4,4,4)),
  time = as.factor(c(1,2,3,1,2,3,1,2,3,1,2,3)),
  study = as.factor(c("A", "A", "A", "B", "B", "B", "A","A", "A", "C", "C", "C")),
  value = c(1,2,3,3,4,1,3,5,3,1,7,2)
)

contrasts(df$study) = contr.sum(length(unique(df$study)))

mod <- lmer(value ~ time + study + (1|participant), data = df)

predict(mod, newdata = data.frame(time = as.factor(c(1,2,3))), re.form=NA)
taffel
  • 133
  • 1
  • 5
  • Could you specify the reasons for trying to ignore the `study` variable? In a (hierarchical) linear model, which you have specified above, it's just plain statistics, that the predictor parameters will be calculated with respect to the other predictors. If you just want to store different models for different studies (guessing from the variable name), you could store e.g. store them in a list. – Dom42 Nov 22 '20 at 14:55
  • I would have preferred to include `study` as a random intercept as there were baseline differences between the studies. Because of the low number of studies, we would prefer to include it as a fixed variable instead, to take the baseline differences into account, but still have predictions on the mean population level – taffel Nov 22 '20 at 18:33

1 Answers1

0

This doesn't make much sense to me. If you want to ignore the effect of study, you simply leave it out of the model you use for predictions. If you want to include it in your model, you need to include it in the predictions.

Suppose you want to predict using only time. You can do it by supplying the weighted mean of study by replicating the entire study column for each value of time you wish to predict and averaging the result:

mod <- lmer(value ~ time + study + (1|participant), data = df)

pred_df <- data.frame(time = as.factor(rep(c(1,2,3), each = nrow(df))),
                      study = rep(df$study, 3))

preds <- predict(mod, newdata = pred_df, re.form = NA)

tapply(preds, pred_df$time, mean)
#>    1    2    3 
#> 2.00 4.50 2.25

Note though, that this is equivalent to just leaving study out of your model altogether:

mod <- lmer(value ~ time + (1|participant), data = df)

predict(mod, newdata = data.frame(time = as.factor(c(1,2,3))), re.form = NA)
#>    1    2    3 
#> 2.00 4.50 2.25
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thank you @allan-cameron for your answer, you are right that the two models provide the same predictions, and I'll probably go for your first solution! The two models you propose are different in terms of variance and I guess they will perform differently if we were to use log-transformation and back-transformation with bias correction? – taffel Nov 22 '20 at 18:46