1

I was fitting a mixed model with random intercept and random slope for longitudinal data using the nlme package in R, similar to the model below (using an artificial and very small dataset for illustration, but the types of variables are similar):

set.seed(1)
Sex = sample(rep(c(0,1), times = 50))
set.seed(2)
BT = sample(rep(c(0,1), times = 50))
AT = abs(round(rnorm(100, 6.5, 4),0))
Age = abs(round(rnorm(100, 3.8, 2),2))
outcome <- abs(round(rnorm(100, 85, 10), 0))
ID <- seq(1,100)

data <- data.frame(ID = ID, Sex = Sex,Age = Age, BT = BT, AT = AT, outcome = outcome)

ids_list <- list()
ids <- seq(1,100)
set.seed(1)
reps <- sample(rep(2:5, each = 25))
for (i in 1:length(ids)){
  ids_list[[i]] <- rep(i, each = reps[i])
}


time_list <- list()
for (i in 1:length(ids)){
  time_list[[i]] <- sort(round(runif(reps[i], 0, 15),0), decreasing = F)
}


new_df <- data.frame(ID = unlist(ids_list), Time = unlist(time_list))

df <- merge(data, new_df, by = "ID")

> head(df)
  ID Sex  Age BT AT outcome Time
1  1   1 0.40  0  5      90    4
2  1   1 0.40  0  5      90    4
3  1   1 0.40  0  5      90    9
4  1   1 0.40  0  5      90   10
5  2   0 1.32  0  7      90    3
6  2   0 1.32  0  7      90    7

model <- lme(outcome ~ Sex + BT + Age + AT*Time, random = ~ 1 + Time|ID,
                   data = df, na.action = na.exclude, method = "REML")

which I wanted to visualise using ggpredict in the ggeffects package. Unfortunately, when using ggpredict,

ggpredict(model, terms = c("AT", "Time"), type = "re") 

I get the following:

# Predicted values of outcome
Error in tapply(x$predicted, list(x$group), NULL) : 
  arguments must have same length

Using other variables to display (in this case, e.g. BT is a factor), I get the same error.

Using the lme4 package and fitting the same model with lmer(), it does work (however I would like to use the nlme package, which is supported by ggeffects if I am not mistaken). So far, reading the vignette and googling haven't helped me fix the problem unfortunately.

yentl02
  • 25
  • 4
  • this isn't reproducible so far; there's no `outcome` variable in your sample data. If you call `traceback()` immediately after the error, what are the results? This *might* be a problem in `broom.mixed` ... – Ben Bolker Aug 31 '22 at 14:11
  • @BenBolker thank you very much for your response. I added a new way to create the dataset (however it gives "system is singular", so I didn't manage to get it working for the model yet). When using ```traceback()```, I get: 4: stop("arguments must have same length") 3: tapply(x$predicted, list(x$group), NULL) 2: print.ggeffects(x) 1: (function (x, ...) UseMethod("print"))(x) – yentl02 Aug 31 '22 at 16:21

1 Answers1

2

tl;dr This looks like a bug in ggpredict to me. (I have posted a link to this answer to your issue on the Github repo; it is now fixed in the development version of ggpredict). I can't replicate your error, but I can generate an example that gives clearly bogus/problematic results.

For the record, this is what type = "re" is supposed to do:

... still returns population-level predictions, however, unlike ‘type = "fixed"’, intervals also consider the uncertainty in the variance parameters (the mean random effect variance, see Johnson et al. 2014 for details) and hence can be considered as prediction intervals.

fit models

library(lme4)
library(nlme)
library(ggeffects)

lmer_fit <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy, REML = TRUE)
lme_fit <- lme(Reaction ~ Days, random = ~ Days | Subject, sleepstudy, method = "REML")
all.equal(fixef(lme_fit), fixef(lmer_fit)) ## TRUE

ggpredict() doesn't give me an error, but it does return empty predictions for the lme fit ...

ggpredict(lme_fit, terms = "Days", type = "re")  ## empty!
ggpredict(lmer_fit, terms = "Days",  type = "re")

Let's dig in a little bit and see what's going on (I had to debug my way down several levels to figure out where to look ...)

debug(ggeffects:::get_predictions_lme)
debug(ggeffects:::get_predictions_merMod)

If we re-run the ggpredict() calls, we eventually get down to some code that is equivalent to this:

fitfram <- data.frame(Days = 0:9, Subject = 0)
predict(lme_fit, newdata = fitfram, level = 1, type = "response")

where level = 1 means to fit within the first grouping level, i.e. at the level of subjects in this case. Setting level = 1 is bad for two reasons: (1) it seems to disagree with the documentation (level = 0 is for population-level predictions); (2) because Subject is set to a level that's not included in the original data, and we're trying to predict at the subject level, this returns all NA values — presumably why the predictions are empty in the printed output.

(It's also a little weird to use type = "response", this is not a legitimate argument to predict.lme, but it gets swallowed by the ... argument and ignored)

How does this compare with what happens with the lmer fit? In this case, the code we get to is equivalent to

predict(lmer_fit, newdata = fitfram, type = "response", re.form = NULL,
        allow.new.levels = TRUE)

which is slightly wonky but works: (1) re.form = NULL says to predict at the lowest level/including all of the random effects (like level = 1 for lme), (2) but allow.new.levels = TRUE specifies that any predictions for unobserved levels (which is all of them in this case) are done at the population level. So we get non-NA answers. It would be more natural to use re.form = NA, which explicitly specifies population-level prediction, but it does work the way it's written.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thanks very much again for the elaborate investigation! As posted on the [Github issue](https://github.com/strengejacke/ggeffects/issues/267), this solution now gives me results for ```type="fixed"```, however unfortunately the error I get for ```type="re"``` is ```Error in FUN(X[[i]], ...) : object 'predicted' not found```. – yentl02 Sep 02 '22 at 08:59
  • Seems that the issue has been fixed using your advice on [Github](https://github.com/strengejacke/ggeffects/issues/267), thank you very much! – yentl02 Sep 02 '22 at 14:03