I am using glmmTMB to analyze a negative binomial generalized linear mixed model (GLMM) where the dependent variable is count data (CT), which is over-dispersed.
There are 115 samples (rows) in the relevant data frame. There are two fixed effects (F1, F2) and a random intercept (R), within which is nested a further random effect (NR). There is also an offset, consisting of the natural logarithm of the total counts in each sample (LOG_TOT).
An example of a data frame, df, is:
CT F1 F2 R NR LOG_TOT
77 0 0 1 1 12.9
167 0 0 2 6 13.7
289 0 0 3 11 13.9
253 0 0 4 16 13.9
125 0 0 5 21 13.7
109 0 0 6 26 13.6
96 1 0 1 2 13.1
169 1 0 2 7 13.7
190 1 0 3 12 13.8
258 1 0 4 17 13.9
101 1 0 5 22 13.5
94 1 0 6 27 13.5
89 1 25 1 4 13.0
166 1 25 2 9 13.6
175 1 25 3 14 13.7
221 1 25 4 19 13.8
131 1 25 5 24 13.5
118 1 25 6 29 13.6
58 1 75 1 5 12.9
123 1 75 2 10 13.4
197 1 75 3 15 13.7
208 1 75 4 20 13.8
113 1 8 1 3 13.2
125 1 8 2 8 13.7
182 1 8 3 13 13.7
224 1 8 4 18 13.9
104 1 8 5 23 13.5
116 1 8 6 28 13.7
122 2 0 1 2 13.1
115 2 0 2 7 13.6
149 2 0 3 12 13.7
270 2 0 4 17 14.1
116 2 0 5 22 13.5
94 2 0 6 27 13.7
73 2 25 1 4 12.8
61 2 25 2 9 13.0
185 2 25 3 14 13.8
159 2 25 4 19 13.7
125 2 25 5 24 13.6
75 2 25 6 29 13.5
121 2 8 1 3 13.0
143 2 8 2 8 13.8
219 2 8 3 13 13.9
191 2 8 4 18 13.7
98 2 8 5 23 13.5
115 2 8 6 28 13.6
110 3 0 1 2 12.8
123 3 0 2 7 13.6
210 3 0 3 12 13.9
354 3 0 4 17 14.4
160 3 0 5 22 13.7
101 3 0 6 27 13.6
69 3 25 1 4 12.6
112 3 25 2 9 13.5
258 3 25 3 14 13.8
174 3 25 4 19 13.5
171 3 25 5 24 13.9
117 3 25 6 29 13.7
38 3 75 1 5 12.1
222 3 75 2 10 14.1
204 3 75 3 15 13.5
235 3 75 4 20 13.7
241 3 75 5 25 13.8
141 3 75 6 30 13.9
113 3 8 1 3 12.9
90 3 8 2 8 13.5
276 3 8 3 13 14.1
199 3 8 4 18 13.8
111 3 8 5 23 13.6
109 3 8 6 28 13.7
135 4 0 1 2 13.1
144 4 0 2 7 13.6
289 4 0 3 12 14.2
395 4 0 4 17 14.6
154 4 0 5 22 13.7
148 4 0 6 27 13.8
58 4 25 1 4 12.8
136 4 25 2 9 13.8
288 4 25 3 14 14.0
113 4 25 4 19 13.5
162 4 25 5 24 13.7
172 4 25 6 29 14.1
2 4 75 1 5 12.3
246 4 75 3 15 13.7
247 4 75 4 20 13.9
114 4 8 1 3 13.1
107 4 8 2 8 13.6
209 4 8 3 13 14.0
190 4 8 4 18 13.9
127 4 8 5 23 13.5
101 4 8 6 28 13.7
167 6 0 1 2 13.4
131 6 0 2 7 13.5
369 6 0 3 12 14.5
434 6 0 4 17 14.9
172 6 0 5 22 13.8
126 6 0 6 27 13.8
90 6 25 1 4 13.1
172 6 25 2 9 13.7
330 6 25 3 14 14.2
131 6 25 4 19 13.7
151 6 25 5 24 13.9
141 6 25 6 29 14.2
7 6 75 1 5 12.2
194 6 75 2 10 14.2
280 6 75 3 15 13.7
253 6 75 4 20 13.8
45 6 75 5 25 13.4
155 6 75 6 30 13.9
208 6 8 1 3 13.5
97 6 8 2 8 13.5
325 6 8 3 13 14.3
235 6 8 4 18 14.1
112 6 8 5 23 13.6
188 6 8 6 28 14.1
The random and nested random effects are treated as factors. The fixed effect F1 has the value 0, 1, 2, 3, 4 and 6. The fixed effect F2 has the values 0, 8, 25 and 75. I am treating the fixed effects as continuous, rather than ordinal, because I would like to identify monotonic unidirectional changes in the dependent variable CT rather than up and down changes.
I previously used the lme4 package to analyze the data as a mixed model:
library(lme4)
m1 <- lmer(CT ~ F1*F2 + (1|R/NR) +
offset(LOG_TOT), data = df, verbose=FALSE)
Followed by the use of glht in the multcomp package for post-hoc analysis employing the formula approach:
library(multcomp)
glht_fixed1 <- glht(m1, linfct = c(
"F1 == 0",
"F1 + 8*F1:F2 == 0",
"F1 + 25*F1:F2 == 0",
"F1 + 75*F1:F2 == 0",
"F1 + (27)*F1:F2 == 0"))
glht_fixed2 <- glht(m1, linfct = c(
"F2 + 1*F1:F2 == 0",
"F2 + 2*F1:F2 == 0",
"F2 + 3*F1:F2 == 0",
"F2 + 4*F1:F2 == 0",
"F2 + 6*F1:F2 == 0",
"F2 + (3.2)*F1:F2 == 0"))
glht_omni <- glht(m1)
Here is the corresponding negative binomial glmmTMB model, which I now prefer:
library(glmmTMB)
m2 <- glmmTMB(CT ~ F1*F2 + (1|R/NR) +
offset(LOG_TOT), data = df, verbose=FALSE, family="nbinom2")
According to this suggestion by Ben Bolker (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2017q3/025813.html), the best approach to post hoc testing with glmmTMB is to use lsmeans (?or its more recent equivalent, emmeans).
I follwed Ben's suggestion, running
source(system.file("other_methods","lsmeans_methods.R",package="glmmTMB"))
and I can then use emmeans on the glmmTMB object. For example,
as.glht(emmeans(m2,~(F1 + 27*F1:F2)))
General Linear Hypotheses
Linear Hypotheses:
Estimate
3.11304347826087, 21 == 0 -8.813
But this does not seem correct. I can also change F1 and F2 to factors and then try this:
as.glht(emmeans(m2,~(week + 27*week:conc)))
General Linear Hypotheses
Linear Hypotheses:
Estimate
0, 0 == 0 -6.721
1, 0 == 0 -6.621
2, 0 == 0 -6.342
3, 0 == 0 -6.740
4, 0 == 0 -6.474
6, 0 == 0 -6.967
0, 8 == 0 -6.694
1, 8 == 0 -6.651
2, 8 == 0 -6.227
3, 8 == 0 -6.812
4, 8 == 0 -6.371
6, 8 == 0 -6.920
0, 25 == 0 -6.653
1, 25 == 0 -6.648
2, 25 == 0 -6.282
3, 25 == 0 -6.766
4, 25 == 0 -6.338
6, 25 == 0 -6.702
0, 75 == 0 -6.470
1, 75 == 0 -6.642
2, 75 == 0 -6.091
3, 75 == 0 -6.531
4, 75 == 0 -5.762
6, 75 == 0 -6.612
But, again, I am not sure how to bend this output to my will. If some kind person could tell me how to correctly carry over the use of formulae in glht and linfct to the emmeans scenario with glmmTMB, I would be very grateful. I have read all the manuals and vignettes until I am blue in face (or it feels that way, at least), but I am still at a loss. In my defense (culpability?) I am a statistical tyro, so many apologies if I am asking a question with very obvious answers here.
The glht software and post hoc testing carries directly over to the glmmADMB package, but glmmADMB is 10x slower than glmmTMB. I need to perform multiple runs of this analysis, each with 300,000 examples of the negative binomial mixed model, so speed is essential.
Many thanks for your suggestions and help!