TurkeyHSD on Anova in R includes unintended comparisons

Question

I have a dataset partially shown below. I have used an anova to look at the difference between genders in the value with region as a covariate. However, when using TukeyHSD for multiple comparisons correction I find that is shows more comparisons than necessary (as below). The only part I am interested in is the first line (i.e. M vs. F within each region), not between regions. Is there a way to specify this in the anova model or Tukey? Thanks!

ID  Age Sex Hemi Region value
1   62  M   R   a   1.81
2   62  M   R   a   1.90
3   72  M   R   a   2.25
1   61  M   L   a   1.58
2   57  F   L   a   2.66
3   62  M   L   a   2.19
1   72  M   R   b   1.93
2   64  F   R   b   1.07
3   65  F   R   b   1.37
1   64  M   L   b   0.97
2   77  F   L   b   1.59
3   27  M   L   b   1.84

model=aov(value~Sex*Region,data=data TukeyHSD(model)

$Sex:as.factor(Region)
         diff          lwr        upr           p adj 
M:a-F:a  0.1125161943  -0.2989139  0.5239462408 1.0000000
F:b-F:a  -0.0383423077 -0.4866878  0.4100031910 1.0000000
M:b-F:a  0.1273767206  -0.2840533  0.5388067671 1.0000000

Roman Luštrik · Accepted Answer · 2014-10-19T11:16:07.113

Sure there is. Exclude the interaction term in your model.

> my.data <- data.frame(y = rnorm(100), region = rep(c("a", "b"), each = 50), gender = sample(c("M", "F"), 100, replace = TRUE))
> head(my.data)
           y region gender
1  0.3333316      a      M
2  1.6364059      a      F
3  0.6679500      a      F
4 -0.7460313      a      M
5  0.7327712      a      F
6 -0.8305134      a      M
> tail(my.data)
               y region gender
95  -1.387498267      b      F
96   0.124378046      b      M
97  -0.568782743      b      M
98  -0.004699849      b      M
99  -1.449213423      b      F
100 -1.146313313      b      F
> 
> mdl1 <- aov(y ~ region * gender, data = my.data)
> TukeyHSD(mdl1)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = y ~ region * gender, data = my.data)

$region
          diff        lwr       upr     p adj
b-a -0.0140727 -0.4041448 0.3759994 0.9430592

$gender
        diff        lwr       upr     p adj
M-F -0.13387 -0.5267813 0.2590414 0.5004702

$`region:gender`
               diff        lwr        upr     p adj
b:F-a:F -0.57419765 -1.3487805 0.20038521 0.2191452
a:M-a:F -0.63398152 -1.3658935 0.09793048 0.1136445
b:M-a:F -0.20795605 -0.9398681 0.52395596 0.8794703
a:M-b:F -0.05978387 -0.7916959 0.67212813 0.9965356
b:M-b:F  0.36624160 -0.3656704 1.09815361 0.5599334
b:M-a:M  0.42602548 -0.2605688 1.11261980 0.3710470

> 
> mdl2 <- update(mdl1, . ~ region + gender)
> TukeyHSD(mdl2)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = y ~ region + gender, data = my.data)

$region
          diff        lwr       upr     p adj
b-a -0.0140727 -0.4147704 0.3866251 0.9445724

$gender
        diff        lwr       upr     p adj
M-F -0.13387 -0.5374843 0.2697444 0.5119119

If you want to delete/keep specific terms, you can do it like so.

> tk <- TukeyHSD(mdl1)
> str(tk)
List of 3
 $ region       : num [1, 1:4] -0.0141 -0.4041 0.376 0.9431
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr "b-a"
  .. ..$ : chr [1:4] "diff" "lwr" "upr" "p adj"
 $ gender       : num [1, 1:4] -0.134 -0.527 0.259 0.5
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr "M-F"
  .. ..$ : chr [1:4] "diff" "lwr" "upr" "p adj"
 $ region:gender: num [1:6, 1:4] -0.5742 -0.634 -0.208 -0.0598 0.3662 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:6] "b:F-a:F" "a:M-a:F" "b:M-a:F" "a:M-b:F" ...
  .. ..$ : chr [1:4] "diff" "lwr" "upr" "p adj"
 - attr(*, "class")= chr [1:2] "TukeyHSD" "multicomp"
 - attr(*, "orig.call")= language aov(formula = y ~ region * gender, data = my.data)
 - attr(*, "conf.level")= num 0.95
 - attr(*, "ordered")= logi FALSE
> rntk <- rownames(tk[["region:gender"]])
> tk[["region:gender"]] <- tk[["region:gender"]][grepl("b:F-a:F|a:M-a:F", rntk), ]
> tk
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = y ~ region * gender, data = my.data)

$region
          diff        lwr       upr     p adj
b-a -0.0140727 -0.4041448 0.3759994 0.9430592

$gender
        diff        lwr       upr     p adj
M-F -0.13387 -0.5267813 0.2590414 0.5004702

$`region:gender`
              diff       lwr        upr     p adj
b:F-a:F -0.5741976 -1.348781 0.20038521 0.2191452
a:M-a:F -0.6339815 -1.365894 0.09793048 0.1136445

I am not sure if the OP wants to delete the whole interactions. I thought the OP would want to keep a part of the interaction, which is the first of the interaction. I wonder if it is straightforward to make R print a part of interactions. — jazzurro, Oct 19 '14 at 10:45
In that case, one should access the `region:gender` interaction and remove the lines manually. — Roman Luštrik, Oct 19 '14 at 10:54
Thanks for this Roman. Deleting part of the interaction would show what I want, but would it change the adjusted-p if I could specify the comparisons rather than running all and then deleting some of them? — Jon, Oct 19 '14 at 12:48
@Jon All calculation is done before you choose specific interactions. So I do not think this specific operation affects adjusted p. By the way, if you just want to extract specific interactions only, you could do something like `tk$`Sex:Region`[c(1,6),]`. — jazzurro, Oct 19 '14 at 15:08
Thanks, that seems like a easy way to select the interactions I'm interested in. — Jon, Oct 19 '14 at 15:31

TurkeyHSD on Anova in R includes unintended comparisons

1 Answers1