What do you set grouping factor to when using glmer/lme4 and predictInterval?

Question

Problem: Using multilevel (mixed-effects) model and not sure what to set grouping variable to in order to generate predicted probabilities for a measured group-level variable from glmer model using merTools' predictInterval function.

Goal: Generate predicted probabilities and SEs/CIs across a range of values from a "second level" group-level variable.

Seeking: advice on how to properly do this or other recommendations to generate predicted probabilities and CIs the range of values for a group level variable from a glmer model.

library(lme4)
library(merTools)
library(ggplot2)

hier_data <- data_frame(pass = sample(c(0, 1), size = 1000, replace = T),
                        wt = rnorm(1000),
                        ht = sample(1:5, size = 1000, replace = T, prob = c(.1, .1, .6, .1, .1)),
                        school_funding = rnorm(1000),
                        school = rep(c("A", "B", "C", "D", "E"), each = 200))

mod <- glmer(pass ~ wt + ht + school_funding + (1 | school),
             family = binomial("logit"), data = hier_data)

### Without school: error
ndata <- data.frame(wt = median(hier_data$wt),
                    ht = median(hier_data$ht),
                    school_funding = seq(from = min(hier_data$school_funding), to =max(hier_data$school_funding), length.out = 100))

pp <- cbind(ndata, predictInterval(merMod = mod,
                      newdata = ndata,
                      type = "probability"))

### Problem, when adding school variable: which school?
ndata <- data.frame(wt = median(hier_data$wt),
                    ht = median(hier_data$ht),
                    school_funding = seq(from = min(hier_data$school_funding), to =max(hier_data$school_funding), length.out = 100),
                    school = "A")

pp <- cbind(ndata, predictInterval(merMod = mod,
                                   newdata = ndata,
                                   type = "probability"))

ggplot(pp, aes(x = school_funding, y = fit)) +
    geom_point() +
    geom_errorbar(aes(ymin = lwr, ymax = upr))

Oliver · Answer 1 · 2019-09-25T15:27:48.910

It seems what you are trying to achieve is effects plots for your variables, with fast prediction intervals. Note first of all that predictInterval does not incorporate the uncertainty in the estimated values of the variance parameters, theta. If more accurate confidence intervals are needed, you should use the bootMer function as described in ?bootMer which uses bootstrapping to estimate the uncertainty. However it might simply be infeasible as the model size and complexity increases. Alternatively the effects package contains the capability to illustrate the effects of merMod objects (however the documentation is simply atrocious).

In general when illustrating the effects of merMod objects a question is "which effects?". Are you interested in the marginal effects or the conditional effects (such as variability in random effects?). If your model contains only first-order random effects (no random slopes), and you are interested in the uncertainty of the fixed-effect coefficient or the effect on the conditional mean, you can get away with using any school and specifying which = "fixed" in predictInterval as

pp <- cbind(ndata, predictInterval(merMod = mod,
                                   newdata = ndata, #<= any school chosen
                                   type = "probability",
                                   which = "fixed"))

Note the size will depend on the chosen school and remaining coefficients as in standard models, and are thus not causal.

If you are interested in the marginal effect, there are multiple methods for approximating this. The optimal would be to bootstrap the predicted values of the marginal mean. Alternatively if the number of independent groups in your grouping variable is "large" enough, you could (maybe) average predicts intervals across groups as illustrated below

newData <- expand.grid(wt = median(hier_data$wt), 
                       ht = median(hier_data$ht),
                       school = levels(hier_data$school),
                       school_funding = seq(from = min(hier_data$school_funding), 
                                            to = max(hier_data$school_funding), 
                                            length.out = 100))
pp <- predictInterval(merMod = mod,
                       newdata = newData,
                       type = "probability")
#Split predictions by every column but school
# And calculate estimated means
predictions <- do.call("rbind", lapply(split(as.data.frame(pp), 
                                             newData[, !names(newData) == "school"]), 
                                       colMeans))
rownames(predictions) <- 1:nrow(predictions)
#create a plot
ggplot(as.data.frame(cbind(predictions, funding = newData$school_funding[newData$school == "A"])), 
       aes(x = funding, y = fit, ymax = upr, ymin = lwr)) + 
    geom_point() +
    geom_errorbar()

For this example the model is more often than not singular and contains very few groups, and as such the result is a unlikely to be a great estimator for the marginal effect, but outside of extracting the simulations from predictInterval it might suffice. It is likely going to improve with models with more grouping levels in the random effect. predictInterval doesn't seem to incorporate a method for this situation directly.

An alternative for looking at marginal effects would be assuming marginal mean of the form 1/(1+exp(-eta) (which is often assumed for new groupings of the random effect). This isn't directly implemented in the predictInterval function, but can be achieved by substracting the random effect from the linear predicter, and only estimating the randomness of the fixed effects, as below:

pp <- predictInterval(merMod = mod,
                      newdata = ndata, #<= any school chosen
                      type = "linear.prediction",
                      which = "fixed")
#remove random effects
pp <- sweep(pp, 1, predict(mod, newdata = ndata, random.only = TRUE), "-")
pp <- 1/(1+exp(-pp))

which could then be plotted using similar methods. For fewer groups this might be a better predictor for the marginal mean(?, someone might correct me here).

In either case, adding a bit of x-jitter might improve the plot.

In all cases there might be some golden nuggets in the references to GLMM FAQ by bolker and others.

What do you set grouping factor to when using glmer/lme4 and predictInterval?

1 Answers1