How do I get properly accounted for nested random effects with lme4?

Question

I have a data frame with subject, wd, and group variables, and a value response variable. Each subject is assigned to one group and has 7 measurements taken over each weekday. Because each subject is completely nested within a group, I want to use a nested random effects model with subject and group, as well as adding a third random effect for wd. Currently, I am using this to do so:

model = lmer(value ~ 1+ (1|wd) + (1|group) + (1|subject), 
             data = dframe, REML = 0)

I found the code I based this off of on page 40 of this guide. I have used both REML = TRUE and REML = 0. However, when I use VarCorr(model)$variances, I get

Groups   Name        Std.Dev.  
subject  (Intercept) 94.9534363
wd       (Intercept) 42.5931401
group    (Intercept)  0.0015608
Residual              0.9589836

This group variance conflicts with the code that I used to generate the data, which has group means of 36.9, 28.78, and -15.269. When I look at "residuals" for the predicted random effects (using ranef) vs. true random effects, I get residuals with very high correlation to the group they are in (if I modeled residuals ~ group, the R-squared value would be over 0.9).

How do I properly fit a nested random effects model in R? I prefer to use lme4, but any package will suffice.

Here is the code I used to generate the data:

library(dplyr)
generate_data <- function(n = 10, g = 3, seed = 1, mean.overall = 300,
                          sigma.g = 50, sigma.wd = 50, 
                          sigma.subject = 100, sigma. = 30) {
    set.seed(seed)
    means.wd = rnorm(7) * sigma.wd
    means.g = rnorm(g) * sigma.g
    means.subject = rnorm(n*g) * sigma.subject
    dframe = data.frame(subject = rep(1:(g*n), each = 7),
                        wd = rep(1:7, g*n), 
                        group = rep(1:g, each = (7*n)))
    dframe = mutate(dframe,
       value = mean.overall + means.wd[wd] +    
           means.subject[subject] + means.g[group] + rnorm(7*g*n),
       subject = factor(subject, levels = 1:(n*g)),
       wd = factor(wd), 
       group = factor(group, levels = 1:g))
    dframe$value = round(pmax(5,dframe$value))
    truefx = list(wd = means.wd, group = means.g, 
                  subject = means.subject)
    list(data = dframe, effects = truefx)
}

dframe = generate_data()$data

I'm having trouble reading in your data, though am not sure what the problem is. Would you check that it works for you? — Aaron left Stack Overflow, May 15 '15 at 17:08
For some reason it wasn't working before in quote blocks, but now it does it line-by-line with code formatting. — Max Candocia, May 15 '15 at 17:14
Can you show the code for how you generated these data? Knowing the "true" model you simulated would likely make it clearer what model you are trying/should fit. — aosmith, May 15 '15 at 18:51
I added the function above. The data I am using uses the default parameters (including seed) for the data. — Max Candocia, May 15 '15 at 19:16
I took the liberty of replacing the dput dump with the code to generate it, as well as editing your function so that it was readable without scrolling. — Aaron left Stack Overflow, May 15 '15 at 19:44
I think you are fitting the model you want to fit. Your estimates will approximate the "truth" on average, across many simulated datasets (so you'll need to remove the seed from the data generation). This example is a good one to show how poor variance estimates can be when we have very few levels. — aosmith, May 15 '15 at 19:45
Yes to @aosmith, though I'm not convinced that with this few levels the variance estimates really will approximate the truth on average. Also if it's really the variance estimates you're interested in, REML is generally thought to provide better variance estimates. — Aaron left Stack Overflow, May 15 '15 at 19:49
I am primarily interested in the random effects per subject. I was using the variance of the random effect to show that there is something problematic when it is very low. — Max Candocia, May 15 '15 at 20:05
I agree, @Aaron, it will very likely be biased low on average and have an especially odd-looking distribution. That's what's so cool about simulations, you can really get to see these problems crop up. — aosmith, May 15 '15 at 20:09

score 0 · Answer 1 · answered May 15 '15 at 17:11

0

I'd guess that you want group as a fixed effect, as there's only a couple levels. Also the nesting of weekday within subject feels unneeded as there's only one observation for each subject/weekday combination. If so, all you should need is

lmer(value ~ group + (1|subject), data = dframe)

It's not clear that weekday is truly nested in subject though; if these were all the same weekdays for all subjects, something else might be more appropriate. That's a better question for stats.stackexchange.com though.

If you really did want to nest these, something like this might work.

lmer(value ~ (1|group/subject/wd), data = dframe)

answered May 15 '15 at 17:11

Aaron left Stack Overflow

36,704
7
77
142

I am currently simulating the data, and I have yet to determine what the final group sizes/number of groups will be, which is why I am asking. – Max Candocia May 15 '15 at 17:19
Also, I am not looking to nest the `wd` variable. I will probably include it as a fixed effect. I tried running `model = lmer(value ~ 1+(1|wd) + (1|group/subject),data = dframe)`, but I still get the same issues. – Max Candocia May 15 '15 at 17:23

How do I get properly accounted for nested random effects with lme4?

1 Answers1