ggeffects::ggpredict() doesn't return population level prediction intervals for nlme::lme()

Question

I'm trying to get population level prediction intervals (PI) from ggeffects:ggpredict() using type = "re" from an nlme:lme() model. ggpredict is not returning the expected data for the lme() model, while the equivalent lmer() model works fine. My data are autocorrelated repeated measures, so I need lme() with correlation = corAR1().

I'm not sure if this is an error, or if I'm just trying to do something for which the tools I'm using aren't designed?

library(lme4)
library(nlme)
library(ggeffects)

Data <- data.frame(
  Subject = factor(c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
                     4, 4, 4, 4, 4, 4, 4, 4, 4, 
                     5, 5, 5, 5, 5, 5, 5, 5, 5, 
                     6, 6, 6, 6, 6, 6, 6, 6, 
                     7, 7, 7, 7, 7, 7, 7, 7, 
                     8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 
                     9, 9, 9, 9, 9, 9, 9, 9, 
                     13, 13, 13, 13, 13, 13, 13, 13, 
                     14, 14, 14, 14, 14, 14, 14, 14, 14, 
                     19, 19, 19, 19, 19, 19, 19)), 
  x = c(20.0, 28.5, 38.0, 47.5, 57.0, 66.5, 76.0, 85.5, 95.0, 100.0, 
           21.0, 31.5, 42.0, 53.0, 63.0, 73.5, 84.0, 95.0, 100.0, 
           20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 
           22.0, 33.0, 44.0, 56.0, 67.0, 78.0, 89.0, 100.0, 
           21.5, 32.0, 43.0, 54.0, 65.0, 76.0, 86.5, 100.0, 
           20.0, 29.0, 38.5, 48.5, 58.0, 67.5, 77.0, 87.0, 96.5, 100.0, 
           23.0, 33.0, 44.0, 56.0, 67.0, 78.0, 89.0, 100.0, 
           23.5, 34.5, 46.5, 57.5, 69.5, 80.5, 92.5, 100.0, 
           20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 
           25.0, 37.5, 50.0, 62.5, 75.0, 87.5, 100.0)/100,
  y = c(1.10, 1.00, 1.25, 1.60, 1.40, 1.20, 2.50, 4.60, 6.80, 10.40, 
          0.90, 1.00, 0.75, 0.90, 1.10, 1.70, 4.35, 9.95, 11.45, 
          1.20, 0.70, 1.30, 1.40, 0.70, 1.25, 2.30, 4.30, 8.20, 
          1.55, 1.15, 0.95, 1.10, 1.90, 3.25, 7.20, 14.30, 
          1.85, 2.00, 1.70, 2.00, 2.35, 3.30, 7.30, 12.10, 
          2.20, 1.95, 1.15, 1.55, 1.65, 3.00, 4.45, 9.05, 13.75, 15.85, 
          1.55, 1.20, 1.35, 1.60, 1.65, 4.70, 6.45, 10.80, 
          1.00, 0.90, 1.00, 1.10, 1.60, 3.60, 8.05, 12.30, 
          0.85, 1.00, 1.05, 1.00, 1.35, 2.00, 3.65, 6.75, 13.10, 
          2.25, 2.35, 2.40, 2.80, 4.90, 8.15, 13.50)
)

Model.lme4 <- lmer(
  y ~ x + (1 | Subject),
  data = Data
)

# first running lme() without autocorrelation
Model.nlme <- lme(
  fixed = y ~ x,
  random = ~ 1 | Subject, 
  data = Data,
)

# Expected data return fine from the lmer() model:
ggpredict(
  Model.lme4,
  terms = c("x [all]"),
  type = "re",
)
# Predicted values of y
# 
#    x | Predicted |         95% CI
# ---------------------------------
# 0.20 |     -1.09 | [-5.93,  3.74]
# 0.28 |     -0.08 | [-4.89,  4.73]
# 0.38 |      0.99 | [-3.80,  5.78]
# 0.47 |      2.06 | [-2.71,  6.84]
# 0.58 |      3.38 | [-1.39,  8.14]
# 0.67 |      4.51 | [-0.26,  9.28]
# 0.77 |      5.70 | [ 0.92, 10.48]
# 1.00 |      8.44 | [ 3.61, 13.27]
# 
# Adjusted for:
# * Subject = 0 (population-level)
# 
# Intervals are prediction intervals.

# When run on the lme() model, predicted values & PIs are missing:
ggpredict(
  Model.nlme,
  terms = c("x [all]"),
  type = "re",
)
# Predicted values of y
# 
#    x
# ----
# 0.20
# 0.28
# 0.38
# 0.47
# 0.58
# 0.67
# 0.77
# 1.00
# 
# Adjusted for:
# * Subject = 0 (population-level)
# 
# Intervals are prediction intervals.

If I use correlation = corAR1() it produces the same results as above. The same thing also happens if I explicitly call terms = c("x [all]", "Subject [0]")

When I add the intended autocorrelation structure, I get prediction & PI values, but only for the first level of the Subject factor:

Model.nlme <- lme(
  fixed = y ~ x,
  random = ~ 1 | Subject, 
  correlation = corAR1(form = ~ x | Subject),
  data = Data,
)

ggpredict(
  Model.nlme,
  terms = c("x [all]"),
  type = "re",
)
# Predicted values of y
# 
#    x | Predicted |         95% CI
# ---------------------------------
# 0.20 |      1.64 | [-1.44,  4.71]
# 0.28 |      2.82 | [-0.23,  5.87]
# 0.38 |      4.07 | [ 1.03,  7.12]
# 0.47 |      5.33 | [ 2.29,  8.36]
# 0.58 |      6.86 | [ 3.82,  9.90]
# 0.67 |      8.18 | [ 5.13, 11.24]
# 0.77 |      9.58 | [ 6.50, 12.66]
# 1.00 |     12.78 | [ 9.61, 15.95]
# 
# Adjusted for:
# * Subject = 3
# 
# Intervals are prediction intervals.

Am I making an error somewhere? Or is there a better way to get the PIs that I want? Thanks!

score 0 · Accepted Answer · answered Jun 20 '22 at 19:40

Unfortunately, I think you might be stuck.

The underlying problem is that broom.mixed::tidy doesn't return the standard deviations of BLUPs:

library(broom.mixed)
tidy(Model.lme4, effects = "ran_vals") ## includes SDs
tidy(Model.nlme, effects = "ran_vals") ## doesn't

However, this isn't really broom.mixed's fault. The problem is that at present the nlme package doesn't offer any option to return the standard deviations of the BLUPs (see e.g. this r-sig-mixed-models@r-project.org thread, which discusses the issue but peters out without providing a solution).

In order to fix this, as best I can tell, you (or someone) would have to dig into the guts of nlme and figure out how to construct the SDs for the BLUPs. If I were doing it I would start with Bates et al 2015 equations 56-59 and see whether it's possible to extract the corresponding components from an lme fit. In particular see section 2.2 of Pinheiro and Bates 2000 for the notation and framework used in nlme: eq 2.22, p. 71, gives the expression for the BLUPs, hopefully one could match that up with the lme code and derive the corresponding expression for the SDs ...

alternatively: you can fit an AR1 model in glmmTMB, which should be tidy-able/ggpredict-able. Code is shown below; I think these are equivalent models, but the example data you've given above doesn't support fitting an AR1 model [we get a singular fit], so I can't be sure.

## define a per-subject observation-number variable (factor)
## alt. tidyverse: Data |> group_by(Subject) |> mutate(obs = seq(n())) |>
##                  mutate(across(obs, factor))
Data2 <- (Data
    |> split(Data$Subject)
    |> lapply(FUN = \(x) transform(x, obs = seq(x)))
    |> do.call(what = rbind)
    |> transform(obs = factor(obs))
)
Model.nlme.acf <- update(Model.nlme,
                         data = Data2,
                         correlation = corAR1(form = ~obs))
library(glmmTMB)
Model.glmmTMB <-  glmmTMB(y ~ x + (1|Subject) + ar1(0 + obs|Subject),
                          data = Data2)

Thanks for this. I will dig into those options. Regarding the singular fit, yeah this is just a subset of the original data. I'll try it on the full dataset. — Jem Arnold, Jun 21 '22 at 07:23
So I'll have to do a lot more reading to wrap my head around `glmmTMB`, but I think this is a working solution. At least from a visual approximation, the glmmTMB polynomial fit on the whole dataset seems more appropriate compared to the equivalent `nlme` fit which returns PIs for only the first ```Subject```. https://i.imgur.com/9yQRosh.png Thanks for pointing me in the right, or at least a productive direction! — Jem Arnold, Jun 23 '22 at 17:23
Couple follow up questions if you get a chance: is there any difference in this case between `ar1(0 + obs|Subject)` and `ar1(obs - 1|Subject)` as I've seen elsewhere? And why in the figure I commented does it seem like the glmmTMB PI is smaller at the far right where the between-subject variability is highest? I would expect the PI to be wider at that point? — Jem Arnold, Jun 23 '22 at 17:24
Oh, maybe to hypothesise an answer to the second of my own questions: In this dataset every subject has an observation at 100% on the x-axis, while at all other x values the subjects are scattered at unique values. So is the higher effective n at 100% shrinking the standard error from which CI & PI are produced? — Jem Arnold, Jun 23 '22 at 17:28
Answering the easy question: `0 + obs` and `obs -1` (or `-1 + obs`) are exactly equivalent. — Ben Bolker, Jun 23 '22 at 20:02

ggeffects::ggpredict() doesn't return population level prediction intervals for nlme::lme()

1 Answers1