I recently found a rather unexpected behavior of glmer for underdispersed data: the number of eggs laid in 4 nestboxes placed in 53 forest plots. The standard deviation estimates get stuck at 0 even if there are quite some between-group variation also the residual standard deviation is not reported.
Here is some simulated data close to the actual data distribution:
set.seed(20180124)
library(lme4) #v1.1-13
plot_int <- rnorm(53,exp(2),1)
datt <- data.frame(id_plot = rep(1:53,each=5))
datt$Egg <- ceiling(rnorm(265,plot_int[datt$id_plot],0.1))
(glmer(Egg ~ 1 + (1 | id_plot),datt,family="poisson"))
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: poisson ( log )
Formula: Egg ~ 1 + (1 | id_plot)
Data: datt
AIC BIC logLik deviance df.resid
1085.9 1093.1 -541.0 1081.9 263
Scaled residuals:
Min 1Q Median 3Q Max
-1.10153 -0.40068 -0.05025 0.30018 0.65060
Random effects:
Groups Name Variance Std.Dev.
id_plot (Intercept) 0 0
Number of obs: 265, groups: id_plot, 53
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.09721 0.02153 97.42 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Is this a desirable behavior from glmer? Would there be some way to detect that during model fitting leading to a warning message?
Running it via lmer gives proper estimation of both the residual and the between-group variance. While trying glmer.nb gives convergence warnings and the dispersion parameter which should be negative (I guess) hits super large values. So what would be the best way to model such data, especially if the normal approximation is not an option (low means ...)?