-1

Despite I made 276 independent observations across 5 sites (lowest number of obs per site: 23), I get the singularity warning and low power to fit a model with one categorical factor (two levels) and one random factor (site). Anybody could tell why would that happen?

Here is a reproducible example:

dataset: https://www.dropbox.com/s/azpnhnemgyo0i72/example.xlsx?dl=0

require(readxl)
require(lme4)

df <- data.frame((read_excel("Example.xlsx", sheet=1)))
    
m1 = lmer(ds.lac ~ sand.avail + (1|site), data = df)

summary(m1)

simr::powerSim(m1, nsim=100)

# results in:
# fit: see ?isSingular
# boundary (singular) fit: sSimulating: 
# |============================Simulating: 
# |============================Simulating: 
# |============================
Power for predictor 'sand.avail', 9.00%  (95% confidence interval): ( 4.20, 16.40)

# Test: Kenward Roger (package pbkrtest)

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Agus camacho
  • 868
  • 2
  • 9
  • 24
  • Hi @Agus camacho! I think you can improve your post including some data (`dput()`) and also informing in which part of the process are you having the waning: fitting or simulating the model. I suggest you move your question to stackexchange (It is more about stats than code). Singular models may appear for different reasons, commonly due to lack of variation in some variables: a description of variables may give you the answer. The response variable in the model seems to have many zeros: that can be the source of the warning and may also indicate the need of use for another model – Jose Dec 20 '21 at 13:56
  • doesn't `sand.avail` have 3 rather than 2 levels? – Ben Bolker Dec 20 '21 at 16:07

1 Answers1

1

As suggested by @Jose, this really belongs on CrossValidated.

pd <- position_dodge(width=0.75)
gg1 <- ggplot(df,
       aes(x = site, y = ds.lac, colour = sand.avail)) +
    stat_sum(position = pd, alpha = 0.8)
ggsave("powerSim.png")    

Also as suggested by Jose, there are a number of reasons this data set doesn't have as much information as you would like

  • most (75%) of the observations are 0 (mean(df$ds.lac==0) counts the proportion)
  • you only have all three sand.avail types available at one site, two of the sites only have one type available (thus there will be a lot of overlap between the random effect of site and the effect of sand.avail)
  • for modern mixed models, 5 levels of the grouping variable is generally considered a minimum

Furthermore, I have a concern about what powerSim is doing here; this seems to be a post hoc power analysis (i.e., estimating the power of a data set you have already gathered), which is widely criticized by statisticians. It would be perfectly reasonable to say "why isn't the effect of sand.avail significant for this model, fitted to these data?" (e.g. car::Anova(m1) gives p=0.66), but asking about the power for this data set is conceptually problematic.

enter image description here

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Dear @Ben Bolker, thank you very much for your advice. I really thought I was in crossvalidated when I made the question, maybe I inadvertently got out of it when I had to log in to make the question. Is ther a way how to move questions from stack overflow to cross validatedçThe warning came during fitting. I think I understand the issue after your explanation. however, why a minimum number of five for – Agus camacho Dec 20 '21 at 19:45
  • 1
    If a couple more sufficiently high-rep users vote to close & migrate, it will go automatically (slightly easier than deleting and reposting, as I would have to repost my answer ...) "How many levels is enough" is discussed in Gelman and Hill 2007 *Applied Regression Modeling* (it should be in the GLMM FAQ as well, but might not be); also see https://peerj.com/articles/1114/ – Ben Bolker Dec 20 '21 at 20:34