Getting wrong p-values for Tukey test for one-way mixed effect ANOVA

Question

I am trying to replicate a textbook example of a Tukey Test for One-Way Mixed Effect ANOVA (from Statistics, William L. Hays p 581-583) but the p-values I am getting using lme & glht don't make sense

The study has repeated measures of four treatments and 10 subjects

The Data

subject=c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 
    6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10)

treatment=c("a1", "a2", "a3", "a4", "a1", "a2", "a3", "a4", "a1", "a2", "a3", 
    "a4", "a1", "a2", "a3", "a4", "a1", "a2", "a3", "a4", "a1", "a2", 
    "a3", "a4", "a1", "a2", "a3", "a4", "a1", "a2", "a3", "a4", "a1", 
    "a2", "a3", "a4", "a1", "a2", "a3", "a4")

response=c(11, 9, 5, 17, 14, 12, 10, 18, 15, 7, 6, 21, 17, 10, 13, 22, 15, 7, 6, 
    15, 14, 8, 13, 22, 9, 6, 7, 15, 17, 11, 10, 19, 10, 13, 14, 23, 12, 
    8, 11, 20)

dataFrame=data.frame(subject, treatment, response)

The model

library(nlme)
model = lme(response~ treatment,random=~1|subject,data=dataFrame)
anova(model)

            numDF denDF  F-value p-value
(Intercept)     1    27 375.9198  <.0001
treatment       3    27  43.4507  <.0001

This F-value is close enough to Hay's (F = 43.41) that I am pretty sure my model is fine.

The Tukey Test

library(multcomp)
glht.out =glht(model, mcp(treatment="Tukey"))
summary(glht.out)

         Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lme.formula(fixed = response ~ treatment, data = dataFrame, random = ~1 | 
    subject)

Linear Hypotheses:
             Estimate Std. Error z value Pr(>|z|)    
a2 - a1 == 0   -4.300      1.006  -4.276   <0.001 ***
a3 - a1 == 0   -3.900      1.006  -3.879   <0.001 ***
a4 - a1 == 0    5.800      1.006   5.768   <0.001 ***
a3 - a2 == 0    0.400      1.006   0.398    0.979    
a4 - a2 == 0   10.100      1.006  10.044   <0.001 ***
a4 - a3 == 0    9.700      1.006   9.647   <0.001 ***

This is not consistent with the book. Hay's only reported some comparisons and gave HSD and mean, not p but for the a3-a1 contrast found there was no significant difference with an HSD = 4.02 and mean = 3.9 which by my calculation has p.value = 1-ptukey(3.9*sqrt(10/8.27),4,9)=0.05719563.

The R output also doesn't make sense because the Tukey test p-value, which is supposed to control for multiple comparisons, is much much smaller than the p-value from a paired t-test (p=0.0142 using t.test(c(11, 14, 15, 17, 15, 14, 9, 17, 10, 12),c(5, 10, 6, 13, 6, 13, 7, 10, 14, 11), paired=TRUE).

Any idea what I am doing wrong and how I can correctly perform a Tukey test in R?

please include all libraries/packages you are using as `glht` is not base. — emilliman5, Feb 19 '18 at 18:45

Emer · Answer 1 · 2018-02-21T11:03:08.330

I investigated your question and these are my findings. Unfortunatelly I don't have the Hay's book so I just focused on the info you posted.

1) Regarding the HSD = 4.02 reported by Hay's book. The lme function uses the RELM fitting method which maximises the restricted log-likelihood while the LM method maximises the log-likelihood, ?lme for more info. The model fit with the ML method shows the same F-value.

model <- lme(response~treatment, random=~1|subject, data=dataFrame, method="ML")
anova.lme(model)
            numDF denDF  F-value p-value
(Intercept)     1    27 375.9227  <.0001
treatment       3    27  43.4506  <.0001

When running the multiple comparisons with glht, the z value is closer to the one reported by the book 4.088, but the contrast still finds the difference significat.

glht.out <- glht(model, mcp(treatment="Tukey"), alternative="two.sided")
summary(glht.out)

     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lme.formula(fixed = response ~ treatment, data = dataFrame, random = ~1 | 
    subject, method = "ML")

Linear Hypotheses:
             Estimate Std. Error z value Pr(>|z|)    
a2 - a1 == 0  -4.3000     0.9539  -4.508  < 1e-04 ***
a3 - a1 == 0  -3.9000     0.9539  -4.088  0.00027 ***
a4 - a1 == 0   5.8000     0.9539   6.080  < 1e-04 ***
a3 - a2 == 0   0.4000     0.9539   0.419  0.97520    
a4 - a2 == 0  10.1000     0.9539  10.588  < 1e-04 ***
a4 - a3 == 0   9.7000     0.9539  10.168  < 1e-04 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

2) Regarding the calculation of the p-value. The previous output shows that the p-values are ajusted using the single-step method. The ?summary.glht description explains that any methods implemented in adjusted and p.adjust can be used. Here is an example of how to change it:

summary(glht.out, test = adjusted(type = "single-step"))
summary(glht.out, test = adjusted(type = "none"))

Unfortunatelly, I couldn't reproduce the p-values from the glht output. So I checked out the TukeyHSD function instead. It requires the aov model output.

aovmodel <- aov(model)
summary(aovmodel)
            Df Sum Sq Mean Sq F value   Pr(>F)    
treatment    3  659.0  219.67   26.95 2.57e-09 ***
Residuals   36  293.4    8.15                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

When running the TukeyHSD we get the following output

TukeyHSD(aovmodel, "treatment")
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = model)

$treatment
      diff       lwr        upr     p adj
a2-a1 -4.3 -7.738482 -0.8615177 0.0093920
a3-a1 -3.9 -7.338482 -0.4615177 0.0210165
a4-a1  5.8  2.361518  9.2384823 0.0003378
a3-a2  0.4 -3.038482  3.8384823 0.9891641
a4-a2 10.1  6.661518 13.5384823 0.0000000
a4-a3  9.7  6.261518 13.1384823 0.0000000

The code is easier to understand than the one from multcomp, at least for me. The p-values can be reproduced as follow:

MSE <- 8.15
center <- -3.9/sqrt(MSE/10)  # -4.32002
ptukey(abs(center), 4, 36, lower.tail=FALSE)
[1] 0.02101645

Note that the HSD is now -4.32002 and the degree of freedom is N-k = 40 - 4 = 36. The p-value is 0.021 which is a slightly higher than one from the paired t-test. Here is a Gist to reproduce the output.

Hope this helps

However, I was intentionally trying to avoid using TukeyHSD and aov model because the aov model can't handle missing data. If you are missing some data for a subject it throws that whole subject out. In this particular example that is fine, but I need to use a proper mixed model (lme or lmer) for my real data. I have realized that the z-scores given by glht gives a much more reasonable answer if you calculate the p-value using a studentized range distribution. `1-ptukey(3.879, 4,27) =0.04931575` which isn't exactly Hay's but much more reasonable. — Sarah Stanley, Feb 21 '18 at 14:47
I was using df = (number of subjects -1)(number of treatments -1) which I think is correct for a within-subject design, but perhaps I should be using df= N-k — Sarah Stanley, Feb 21 '18 at 14:50

Getting wrong p-values for Tukey test for one-way mixed effect ANOVA

1 Answers1