0

I am working with a large dataset and analyzing a continuous dependent variable with a linear mixed effects model using the R package lme4. I am also using the extension lmerTest, which allows to compute various plots and the p-values associated with fixed and random terms.

When I run the rand() to obtain a p-value associated with each random term, I obtain the following error:

Error in anova.merMod(object = object, ... = ...) : models were not all fitted to the same size of dataset

This is because one of my random terms includes missing values, while others don't.

My question is: within the rand function, how can I deal with differences in dataset sizes? Is there an argument that allows to automatically omit NAs? I tried to look at the help page for that function but the documentation is very limited.

Thanks!

Mehdi.K
  • 371
  • 4
  • 15
  • You can use `complete.cases` on all the variables found in the bigger of the two models and use that to create a data frame that is consistent across models. – ekstroem Jul 17 '17 at 23:45
  • So for example if my code looks like: `model <- lmer(response ~ fixed1 + fixed2 + (1|random1) + (1|random2) + (1|random3))` And I want to run `rand(model)`, how exactly would `complete.cases` fit in? – Mehdi.K Jul 18 '17 at 00:15
  • Always use the data argument of lme4 functions. Omitting it is not supported well (and not good practice). – Roland Jul 18 '17 at 03:54
  • I usually use `with(data, lmer())`. Is that fine too? – Mehdi.K Jul 18 '17 at 04:05
  • No, that's not fine. The lme4 developers don't really rest such usage. They expect users to use the data argument. – Roland Jul 18 '17 at 09:10
  • Thank you, I didn't know that. However, the results are exactly the same. I suppose the problems arise only in certain cases. – Mehdi.K Jul 20 '17 at 01:34

1 Answers1

1

Here's an example using the data example from the lmerTest package. In the example we wish to run this code

library(lmerTest)
m <- lmer(Preference ~ sens2+Homesize+(1+sens2|Consumer), data=carrots)
rand(m)

First we identify which variables are available for the largest of the models. I use the pipe and functions from tidyverse below but you could do the same with with. All the variable from the full model should be included here

cc <- carrots %>% select(Preference, sens2, Homesize, Consumer) %>% complete.cases()

cc now contains a vector of logicals with the rows that contain full sets of observations. Those are the ones we should use throughout the analyses. We make sure this is true by adding the subset argument

m <- lmer(Preference ~ sens2+Homesize+(1+sens2|Consumer), subset=cc, data=carrots)
ekstroem
  • 5,957
  • 3
  • 22
  • 48
  • However, in some cases I obtain the following error message: `Error in select(., ..., : unused arguments (...)`, where "..." are all the effects I am testing for. What does that mean? – Mehdi.K Jul 21 '17 at 00:50
  • I found the reason behind this issue. The `select` function is not unique to the `dplyr` package, and can thus clash with other packages. Using `dplyr::select` worked. – Mehdi.K Jul 21 '17 at 03:10