glmmTMB, post-hoc testing and glht

Question

I am using glmmTMB to analyze a negative binomial generalized linear mixed model (GLMM) where the dependent variable is count data (CT), which is over-dispersed.

There are 115 samples (rows) in the relevant data frame. There are two fixed effects (F1, F2) and a random intercept (R), within which is nested a further random effect (NR). There is also an offset, consisting of the natural logarithm of the total counts in each sample (LOG_TOT).

An example of a data frame, df, is:

CT  F1  F2  R   NR  LOG_TOT
77  0   0   1   1   12.9
167 0   0   2   6   13.7
289 0   0   3   11  13.9
253 0   0   4   16  13.9
125 0   0   5   21  13.7
109 0   0   6   26  13.6
96  1   0   1   2   13.1
169 1   0   2   7   13.7
190 1   0   3   12  13.8
258 1   0   4   17  13.9
101 1   0   5   22  13.5
94  1   0   6   27  13.5
89  1   25  1   4   13.0
166 1   25  2   9   13.6
175 1   25  3   14  13.7
221 1   25  4   19  13.8
131 1   25  5   24  13.5
118 1   25  6   29  13.6
58  1   75  1   5   12.9
123 1   75  2   10  13.4
197 1   75  3   15  13.7
208 1   75  4   20  13.8
113 1   8   1   3   13.2
125 1   8   2   8   13.7
182 1   8   3   13  13.7
224 1   8   4   18  13.9
104 1   8   5   23  13.5
116 1   8   6   28  13.7
122 2   0   1   2   13.1
115 2   0   2   7   13.6
149 2   0   3   12  13.7
270 2   0   4   17  14.1
116 2   0   5   22  13.5
94  2   0   6   27  13.7
73  2   25  1   4   12.8
61  2   25  2   9   13.0
185 2   25  3   14  13.8
159 2   25  4   19  13.7
125 2   25  5   24  13.6
75  2   25  6   29  13.5
121 2   8   1   3   13.0
143 2   8   2   8   13.8
219 2   8   3   13  13.9
191 2   8   4   18  13.7
98  2   8   5   23  13.5
115 2   8   6   28  13.6
110 3   0   1   2   12.8
123 3   0   2   7   13.6
210 3   0   3   12  13.9
354 3   0   4   17  14.4
160 3   0   5   22  13.7
101 3   0   6   27  13.6
69  3   25  1   4   12.6
112 3   25  2   9   13.5
258 3   25  3   14  13.8
174 3   25  4   19  13.5
171 3   25  5   24  13.9
117 3   25  6   29  13.7
38  3   75  1   5   12.1
222 3   75  2   10  14.1
204 3   75  3   15  13.5
235 3   75  4   20  13.7
241 3   75  5   25  13.8
141 3   75  6   30  13.9
113 3   8   1   3   12.9
90  3   8   2   8   13.5
276 3   8   3   13  14.1
199 3   8   4   18  13.8
111 3   8   5   23  13.6
109 3   8   6   28  13.7
135 4   0   1   2   13.1
144 4   0   2   7   13.6
289 4   0   3   12  14.2
395 4   0   4   17  14.6
154 4   0   5   22  13.7
148 4   0   6   27  13.8
58  4   25  1   4   12.8
136 4   25  2   9   13.8
288 4   25  3   14  14.0
113 4   25  4   19  13.5
162 4   25  5   24  13.7
172 4   25  6   29  14.1
2   4   75  1   5   12.3
246 4   75  3   15  13.7
247 4   75  4   20  13.9
114 4   8   1   3   13.1
107 4   8   2   8   13.6
209 4   8   3   13  14.0
190 4   8   4   18  13.9
127 4   8   5   23  13.5
101 4   8   6   28  13.7
167 6   0   1   2   13.4
131 6   0   2   7   13.5
369 6   0   3   12  14.5
434 6   0   4   17  14.9
172 6   0   5   22  13.8
126 6   0   6   27  13.8
90  6   25  1   4   13.1
172 6   25  2   9   13.7
330 6   25  3   14  14.2
131 6   25  4   19  13.7
151 6   25  5   24  13.9
141 6   25  6   29  14.2
7   6   75  1   5   12.2
194 6   75  2   10  14.2
280 6   75  3   15  13.7
253 6   75  4   20  13.8
45  6   75  5   25  13.4
155 6   75  6   30  13.9
208 6   8   1   3   13.5
97  6   8   2   8   13.5
325 6   8   3   13  14.3
235 6   8   4   18  14.1
112 6   8   5   23  13.6
188 6   8   6   28  14.1

The random and nested random effects are treated as factors. The fixed effect F1 has the value 0, 1, 2, 3, 4 and 6. The fixed effect F2 has the values 0, 8, 25 and 75. I am treating the fixed effects as continuous, rather than ordinal, because I would like to identify monotonic unidirectional changes in the dependent variable CT rather than up and down changes.

I previously used the lme4 package to analyze the data as a mixed model:

library(lme4)

m1 <- lmer(CT ~ F1*F2 + (1|R/NR) +
offset(LOG_TOT), data = df, verbose=FALSE)

Followed by the use of glht in the multcomp package for post-hoc analysis employing the formula approach:

library(multcomp)

glht_fixed1 <- glht(m1, linfct = c(
"F1 == 0",
"F1 + 8*F1:F2 == 0",
"F1 + 25*F1:F2 == 0",
"F1 + 75*F1:F2 == 0",
"F1 + (27)*F1:F2 == 0"))

glht_fixed2 <- glht(m1, linfct = c(
"F2 + 1*F1:F2 == 0",
"F2 + 2*F1:F2 == 0",
"F2 + 3*F1:F2 == 0",
"F2 + 4*F1:F2 == 0",
"F2 + 6*F1:F2 == 0",
"F2 + (3.2)*F1:F2 == 0"))

glht_omni <- glht(m1)

Here is the corresponding negative binomial glmmTMB model, which I now prefer:

library(glmmTMB)

m2 <- glmmTMB(CT ~ F1*F2 + (1|R/NR) + 
offset(LOG_TOT), data = df, verbose=FALSE, family="nbinom2")

According to this suggestion by Ben Bolker (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2017q3/025813.html), the best approach to post hoc testing with glmmTMB is to use lsmeans (?or its more recent equivalent, emmeans).

I follwed Ben's suggestion, running

source(system.file("other_methods","lsmeans_methods.R",package="glmmTMB"))

and I can then use emmeans on the glmmTMB object. For example,

as.glht(emmeans(m2,~(F1 + 27*F1:F2)))

General Linear Hypotheses

Linear Hypotheses:
Estimate
3.11304347826087, 21 == 0 -8.813

But this does not seem correct. I can also change F1 and F2 to factors and then try this:

as.glht(emmeans(m2,~(week + 27*week:conc)))

     General Linear Hypotheses

Linear Hypotheses:
           Estimate
0, 0 == 0    -6.721
1, 0 == 0    -6.621
2, 0 == 0    -6.342
3, 0 == 0    -6.740
4, 0 == 0    -6.474
6, 0 == 0    -6.967
0, 8 == 0    -6.694
1, 8 == 0    -6.651
2, 8 == 0    -6.227
3, 8 == 0    -6.812
4, 8 == 0    -6.371
6, 8 == 0    -6.920
0, 25 == 0   -6.653
1, 25 == 0   -6.648
2, 25 == 0   -6.282
3, 25 == 0   -6.766
4, 25 == 0   -6.338
6, 25 == 0   -6.702
0, 75 == 0   -6.470
1, 75 == 0   -6.642
2, 75 == 0   -6.091
3, 75 == 0   -6.531
4, 75 == 0   -5.762
6, 75 == 0   -6.612

But, again, I am not sure how to bend this output to my will. If some kind person could tell me how to correctly carry over the use of formulae in glht and linfct to the emmeans scenario with glmmTMB, I would be very grateful. I have read all the manuals and vignettes until I am blue in face (or it feels that way, at least), but I am still at a loss. In my defense (culpability?) I am a statistical tyro, so many apologies if I am asking a question with very obvious answers here.

The glht software and post hoc testing carries directly over to the glmmADMB package, but glmmADMB is 10x slower than glmmTMB. I need to perform multiple runs of this analysis, each with 300,000 examples of the negative binomial mixed model, so speed is essential.

Many thanks for your suggestions and help!

it's also the case that `glht()` should work with `glmmTMB` objects in the latest development version of `glmmTMB`: `devtools::install_github("glmmTMB/glmmTMB/glmmTMB")` and look at [this vignette](https://github.com/glmmTMB/glmmTMB/blob/master/glmmTMB/vignettes/model_evaluation.rmd) ... — Ben Bolker, Jul 15 '18 at 03:27
Many thanks for your suggestions, Ben. I will certainly nibble on them and give them a good thinking over. Also many thanks for your greatly appreciated contributions to the world of mixed models as well as mixed up scholars, like me! — Bob, Jul 15 '18 at 09:37

score 3 · Answer 1 · answered Jul 15 '18 at 02:32

The second argument (specs) to emmeans is not the same as the linfct argument in glht, so you can't use it in the same way. You have to call emmeans() using it the way it was intended. The as.glht() function converts the result to a glht object, but it really is not necessary to do that as the emmeans summary yields similar results.

I think the results you were trying to get are obtainable via

emmeans(m2, ~ F2, at = list(F2 = c(0, 8, 25, 75)))

(using the original model with the predictors as quantitative variables). This will compute the adjusted means holding F1 at its average, and at each of the specified values of F2.

Please look at the documentation for emmeans(). In addition, there are lots of vignettes that provide explanations and examples -- starting with https://cran.r-project.org/web/packages/emmeans/vignettes/basics.html.

Thanks for your comments, Russ (@rvl). Also thanks so much for your great work on lsmeans and emmeans. Perhaps not surpisingly, I think my answer is a better fit to my question than yours. My answer uses emmeans to estimate the slope of F1 at the various levels of F2 and vice versa. Your answer (I think) gives the averaged values of F1 at the various levels of F2, from which a single mean slope can be calculated by dividing delta F1 by delta F2 (I could be wrong here, though). I am inclined to accept my answer, but would love to have your perspective, because you are clearly the leader here! — Bob, Jul 15 '18 at 09:34
Sorry, that should be delta CT (change in dependent variable) divided by delta F2. — Bob, Jul 15 '18 at 09:50

Bob · Accepted Answer · 2019-10-25T03:54:29.190

Following the advice of my excellent statistical consultant, I think the solution below provides what I had previously obtained using glht and linfct.

The slopes for F1 are calculated at the various levels of F2 by using contrast and emmeans to compute the differences in the dependendent variable between two values of F1 separated by one unit (i.e. c(0,1)). (Since the regression is linear, the two values of F1 are arbitrary, provided they are separated by one unit, eg c(3,4)). Vice versa for the slopes of F2.

Thus, slopes of F1 at F2 = 0, 8, 25, 75 and 27 (27 is average of F2):

contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=0)),list(c(-1,1)))
        (above equivalent to: summary(m1)$coefficients$cond["F1",])
        contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=8)),list(c(-1,1)))
        contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=25)),list(c(-1,1)))
        contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=75)),list(c(-1,1)))
        contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=27)),list(c(-1,1)))

and slopes of F2 at F1 = 1, 2, 3, 4, 6 and 3.2 (3.2 is average of F1, excluding zero value):

contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=0)),list(c(-1,1)))
(above equivalent to: summary(m1)$coefficients$cond["F2",])
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=1)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=2)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=3)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=4)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=6)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=3.2)),list(c(-1,1)))

Interaction of F1 and F2 slopes at F1 = 0 and F2 = 0

contrast(emmeans(m1, specs=c("F1","F2"), at=list(F1=c(0,1),F2=c(0,1))),list(c(1,-1,-1,1)))
(above equivalent to: summary(m1)$coefficients$cond["F1:F2",])

From the resulting emmGrid objects provided from contrast(), one can pick out as desired the estimate of the slope (estimate), standard deviation of the estimated slope (SE), Z score for the difference of the estimated slope from a null hypothesized slope of zero (z.ratio, calculated by emmGrid from estimate divided by SE) and corresponding P value (p.value calculated by emmGrid as 2*pnorm(-abs(z.ratio)).

For example:

contrast(emmeans(m1, specs="F1", at=list(F2=c(0,1), F1=0)),list(c(-1,1)))

yields:

NOTE: Results may be misleading due to involvement in interactions
 contrast    estimate          SE df z.ratio p.value
 c(-1, 1) 0.001971714 0.002616634 NA   0.754  0.4511

Postscript added 1.25 yrs later:

The above gives the correct solutions, but as Russell Lenth pointed out the answers are more easily obtained using emtrends. However, I have selected this answer as being correct since there may be have some didactic value in showing how to calculate slopes using emmeans to find the resulting change in the predicted dependent variable when the independent variable changes by 1.

Look at the function `emtrends()`, whjich is designed for this kind of thing. E.g., `emtrends(m1, “F1”, var = “F2”, at = list(F1 = c(1,2,3,3.2,4,6)))` — Russ Lenth, Jul 15 '18 at 14:08
Thanks @rvl. However, emtrends gives implausibly large z scores (F2.trend/SE ~10,000) with the above df. For example, 'F1 F2.trend SE df asymp.LCL asymp.UCL' '1.0 13.62482 0.001439265 NA 13.62200 13.62764' — Bob, Jul 15 '18 at 17:21
Well I have quite a bit of confidence in the code, so make sure the arguments you are providing are reasonable settings for the two variables. E.g., maybe you need to specify both variables in `at` — Russ Lenth, Jul 15 '18 at 22:52
Also, it usually isn’t surprising if the trends themselves are highly significant. Do you want *comparisons* of these slopes? Then use `pairs()`, for example. — Russ Lenth, Jul 15 '18 at 22:55
Thanks rmuch @rvl. I am pretty certain I want the significance of the individual slopes (compared to an H0 of zero slope) and I do not want comparisons of pairs of slopes. Regarding the other issues, I will discuss with my local stats expert, since it is quite likely I am horribly confused. If light emerges I will re-comment. — Bob, Jul 15 '18 at 23:47
The example code I gave computes the slopes at `F1` set to its mean, unlike the calculations you did at `F1` around zero. I'd suggest visualizing the trends as well, via `emmip(m1, F2 ~ F1, cov.reduce = FALSE)` and `emmip(m1, F1 ~ F2, cov.reduce = FALSE)`. — Russ Lenth, Jul 17 '18 at 02:17
Thanks @rvl! Very nice visualization tool with emmip. I will have a confab with my stats consultant (probably next week) and will re-comment if my understanding improves... — Bob, Jul 17 '18 at 22:52
Sorry for the very long radio silence, @rvl. In the intervening period, I have obtained a lot more experience with emmeans and believe my answer is correct although, as you point out, a bit clunky compared to the use of emtrends. I have accepted my answer for its possible didactic value. Thanks so much again for all your help! — Bob, Oct 25 '19 at 03:58
To get interaction coefficient and interaction significance using emtrends: `contrast(emtrends(m1, specs="F1", var="F2", at=list(F1=c(0,1))), method=list(c(-1,1)))` — Bob, Aug 17 '21 at 18:50

glmmTMB, post-hoc testing and glht

2 Answers2