6

I would like to conduct a simulation-based power analysis for a linear mixed model in lmer with repeated measures from scratch. I understand that simr might be the package to go with. However, I do not understand what I have to do and am not able to make sense of anything I read online.

I want to test a random intercept model where

  • y (a z-standardized outcome variable) ~
    • a + b (z-standardized covariates)
    • + c + d (binary covariates)
    • + time1 + time2 (time dummies indicating measurement at t1 or t2 as opposed to t0)
    • + group (a group with two expressions)
    • + group:time1 + group:time2
    • + (1 | participant)

i.e., three measurements are nested in participants using a random intercept.

How do I simulate data so that I am able to find out what would be an appropriate sample size for an interaction effect (e.g. group:time1) of beta = .3 to be found with a power of .80? And which additional info do I need to do that?

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453

1 Answers1

4

There are packages such as simr which will do all of this, and more, for you (and will handle unbalanced designs too), but here is a simple approach to simulating data for a mixed model, which you can then use in a power analysis, from scratch:

There are several important parameters to consider:

  • the overall data size
  • whether the data are balanced or not
  • the number of participants
  • the fixed effects
  • the random effects variance
  • the residual variance

Here we use a balanced design, which makes the setup easier, but it would not be too hard to extend it to unbalanced designs:

library(lme4)
set.seed(15)

N <- 10    # number of unique values of a and b for each participant
N.participant <- 20   # number of participants
betas <- c(10, 1, 2, 3, 4, 5, 6, 7)   # fixed effects
s <- 0.5   # variance of random effects
e <- 1  # residual variance

# form the balanced data frame
dt <- expand.grid(a = rnorm(N), b = rnorm(N), c = c(0,1), d = c(0,1), time = c(0,1,2), group = c(0,1), participant = LETTERS[1:N.participant])

# form the model matrix for fixed effects
X <- model.matrix(~ a + b + c + d + time + group + time:group, data = dt)

# set a vector of 1s for the outcome. This is needed so we can easily compute the
# model matrix for random effects (in this case just the random intercetps)
dt$y <- 1

# The final model formula
myFormula <- "y ~ a + b + c + d + time + group + time:group + (1 | participant)"

# obtain the model matrix for random effects
foo <- lFormula(eval(myFormula), dt)
Z <- t(as.matrix(foo$reTrms$Zt))

# simulate the random effects
u <- rnorm(N.participant , 0, s)

# simulate y using the general mixed model equation and residual variance e
dt$y <- X %*% betas + Z %*% u + rnorm(nrow(dt), 0, e)

# fit the model
m0 <- lmer(myFormula, dt)
summary(m0)

You would run this a number of times for each set of parameters, with different seeds of course, and compute the relevant Monte Carlo estimates, and the achieved power for each set.

Robert Long
  • 5,722
  • 5
  • 29
  • 50
  • Huge thanks for your answer and your willingness to help! I am still not able to understand everything you write, I guess it might help me if you`d either comment the code a bit more. Or could you help me how I would do it with simr? All the best to you! – Sebastian N Jan 26 '22 at 17:24
  • 1
    You're welcome. I have already included several comments in the code already but if there is anything in particular that you don't understand, I can happily add some more. – Robert Long Jan 26 '22 at 19:17