3

I am currently trying to help a colleague and I simply cannot find a solution. So I am hoping that someone else might be able to help us.

I have a dataset containing weight data assessed with different study designs, for different species in different studies (a study included multiple designs and multiple species). I want to investigate the relation between the weight and the study design, using study and species as nested random effect.

The model looks like this and runs fine:

m <- lmer(weight ~ design +(1|study/species), data=dataset)

I tried to make predictions for the different species but with a generic study: I made a new data.table new.dt that contains the unique design-species-combinations of the original dataset and added a column for the report.

new.dt <- unique(dataset[,.(design, species))
new.dt$study <- "xyz"

Then I used the predict-function and allowed new levels.

new.dt$p <- predict(m, newdata=new.dt, re.form= NULL, allow.new.levels=TRUE) 

I don't get an error, but I get the same prediction for every species in a design.

Is there a way to keep the original levels of one part of the nested random effect and make the other part a new level?

Thank you in advance!

UPDATE - working example: This issue is not dependent on the dataset.

library(data.table)
library(lme4)

dt <- data.table(expand.grid(design=c("a", "b"), species=c("x", "y", "z"), report=c("1", "2", "3"), count=seq(1, 10, 1)))
dt$weight <- 0

dt[species=="x"]$weight <- rnorm(60, 70, 10)
dt[species=="y"]$weight <- rnorm(60, 80, 15)
dt[species=="z"]$weight <- rnorm(60, 90, 20)

dt[design=="a"]$weight <- dt[design=="a"]$weight- 0.1*dt[design=="a"]$weight

dt[report=="1"]$weight <- dt[report=="1"]$weight+0.15*dt[report=="1"]$weight
dt[report=="2"]$weight <- dt[report=="2"]$weight-0.15*dt[report=="1"]$weight

m <-lmer(weight~design+(1|report/species), data=dt)

dt.pred <- unique(dt[,c(1:2)])
dt.pred$report<- "xyz"
dt.pred$pred<-predict(m, newdata=dt.pred, re.form= NULL, allow.new.levels=TRUE) 
ulfelder
  • 5,305
  • 1
  • 22
  • 40
I Ho
  • 147
  • 1
  • 6
  • You might want to check out the `merTools` package for generating predictions from mixed-effects models via simulation. Documentation [here](https://cran.r-project.org/web/packages/merTools/index.html). – ulfelder Feb 03 '20 at 12:00
  • what is the size of the variance term in `m` for the `species` ? – user20650 Feb 03 '20 at 13:55
  • var(ranef(m)$`species:study`) is 8.15 - if that is the right answer to this question. – I Ho Feb 03 '20 at 15:24
  • no, I mean the variance shown in the output or returned by `VarCorr(m)` -- it was just a thought in case the variance was v small / almost zero then there could be no different in predictions . just a thought. – user20650 Feb 03 '20 at 16:06
  • This seems to give the same results as for `re.form=NA` or `re.form=~0` i.e. no random effects. I wonder if this is relevant: "*The prediction will use the unconditional (population-level) values for data with previously unobserved levels (or NAs).*" – user20650 Feb 04 '20 at 10:46

2 Answers2

2

The 'sameness' comes from the fact that you are setting re.form = NULL or equivalently re.form = ~ 0.

The Linear mixed effect model models Y|beta,b ~ intercept + X %*% beta + Z %*% b + e, and by setting re.form = NULL you are setting the definition of Z %*% b = 0 during your prediction. As this is the random part of your model, ( i.e. (1|report/species) ) you are removing the random effect of species and report.

In mixed models you would call this kind of prediction "unconditional prediction" (or marginal prediction) [while it is more pseudo-unconditional in practice]. Often it is used in models where the random effect contains individual. In this case, when you observe a new individual you have an unknown random effect, but depending on your study you might only be interested in the "systematic" or "fixed" effect (i.e. did the individual walk to work before getting hit by the car? Did he bike?). Here it makes sense to look only at the unconditional/marginal effect.

Said another way, by setting re.form = NULL you are saying Z %*% b = 0. As species is part of Z with weights b, you are unable to see the species-specific effect on your prediction.

Only if you know the species and can use the random effects in your prediction will you get different prediction across species with the same fixed effects.

Ps. The data.table package has an equivalent function to expand.grid called CJ, which will for larger sets is quite a bit faster and more memory efficient.

Oliver
  • 8,169
  • 3
  • 15
  • 37
  • I'm confused by your explanation when I compare this to the documentation here for the "re.form" parameter : re.form (formula, NULL, or NA) specify which random effects to condition on when predicting. If NULL, include all random effects; if NA or ~0, include no random effects. You've said if NULL there's no random effect, but this said that would actually be true if set to NA, and NULL would imply including all random effects? Am I missing something? https://www.rdocumentation.org/packages/lme4/versions/1.1-28/topics/predict.merMod – Brewkeeper Mar 30 '22 at 03:45
0

You could use the ggeffects-package, which allows you to get predictions either for the fixed effects (including CIs), or conditioned on group levels of random effects (here, no CIs are returned).

Here is an example with your data, more examples can be found in this vignette.

library(data.table)
library(lme4)
#> Loading required package: Matrix

dt <- data.table(expand.grid(design=c("a", "b"), species=c("x", "y", "z"), report=c("1", "2", "3"), count=seq(1, 10, 1)))
dt$weight <- 0

dt[species=="x"]$weight <- rnorm(60, 70, 10)
dt[species=="y"]$weight <- rnorm(60, 80, 15)
dt[species=="z"]$weight <- rnorm(60, 90, 20)

dt[design=="a"]$weight <- dt[design=="a"]$weight- 0.1*dt[design=="a"]$weight

dt[report=="1"]$weight <- dt[report=="1"]$weight+0.15*dt[report=="1"]$weight
dt[report=="2"]$weight <- dt[report=="2"]$weight-0.15*dt[report=="1"]$weight

m <-lmer(weight~design+(1|report/species), data=dt)

library(ggeffects)
ggpredict(m, "design")
#> 
#> # Predicted values of weight
#> # x = design
#> 
#> x | Predicted |   SE |         95% CI
#> -------------------------------------
#> a |     72.64 | 6.57 | [59.77, 85.52]
#> b |     82.66 | 6.57 | [69.78, 95.54]
#> 
#> Adjusted for:
#> * species = 0 (population-level)
#> *  report = 0 (population-level)

ggpredict(m, c("design", "report"), type = "re")
#> 
#> # Predicted values of weight
#> # x = design
#> 
#> # report = 1
#> 
#> x | Predicted
#> -------------
#> a |     80.78
#> b |     90.80
#> 
#> # report = 2
#> 
#> x | Predicted
#> -------------
#> a |     64.91
#> b |     74.92
#> 
#> # report = 3
#> 
#> x | Predicted
#> -------------
#> a |     72.24
#> b |     82.26
#> 
#> Adjusted for:
#> * species = 0 (population-level)

plot(ggpredict(m, c("design", "report"), type = "re"))
#> Loading required namespace: ggplot2

Created on 2020-02-07 by the reprex package (v0.3.0)

Daniel
  • 7,252
  • 6
  • 26
  • 38