2

Let's say I have data of the lung capacity of smokers and non-smokers. So we have the variable "lungCap" with a numeric value, and the variable "smoking" with the values "yes" or "no". Now I want to see if the capacity of non-smokers is bigger than that of smokers:

t.test(lungCap~smoking, alt="greater")

Does the test now calculate if "yes" > "no" or "no" > "yes"? How is this determined? I could not find it in the help for the t.test command.

E_H
  • 131
  • 7
  • 4
    In the documentation: alternative = "greater" is the alternative that x has a larger mean than y. – phiver Apr 15 '18 at 08:59
  • 2
    @phiver - true, but the documentation does not directly specify how `t.test()` allocates values of the independent variable as `x` versus `y`, so it's not obvious which one will print in the left column of the output (the x side) versus the right column (the y side). – Len Greski Apr 15 '18 at 16:09

1 Answers1

3

When using a character-based independent variable, t.test() will compare based on the alphabetical order of the values in the independent variable.

To illustrate, we'll compare miles per gallon in cars with manual vs. automatic transmissions using the 1973 Motor Trend cars data set.

We'll create a character variable to represent automatic vs. manual (to illustrate the scenario in the OP) and run a t test.

We'll test the following hypotheses:

  • H_null: mpg of manual transmission cars <= mpg of automatic transmission cars
  • H_alt: mpg of manual transmission cars is greater than mpg of automatic transmission cars.

To run the test, we'll load the data, create the extra column and execute t.test().

data(mtcars)
mtcars$trans <- ifelse(mtcars$am == 1,"manual","automatic")

t.test(mtcars$mpg ~ mtcars$trans,alt="greater")

...and the output:

> t.test(mtcars$mpg ~ mtcars$trans,alt="greater")

    Welch Two Sample t-test

data:  mtcars$mpg by mtcars$trans
t = -3.7671, df = 18.332, p-value = 0.9993
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 -10.57662       Inf
sample estimates:
mean in group automatic    mean in group manual 
               17.14737                24.39231 

What we see here is that t.test() runs automatic > manual, and hence the p-value is 0.9993.

To correctly run the test we'll modify it to use the alt="less" argument.

> t.test(mtcars$mpg ~ mtcars$trans,alt="less")

    Welch Two Sample t-test

data:  mtcars$mpg by mtcars$trans
t = -3.7671, df = 18.332, p-value = 0.0006868
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
      -Inf -3.913256
sample estimates:
mean in group automatic    mean in group manual 
               17.14737                24.39231 

>

Here we see the reported p-value as 0.0006, meaning that we reject the null hypothesis in favor of the alternate hypothesis that automatic transmission cars have lower average miles per gallon than manual transmission cars.

Changing the Order of Comparison

Responding to the questions in the comments about whether there is a way to change the grouping order, the t.test() function does not provide a way to do this. However, one can simply add 1. and 2. in front of the group names to force t.test() to use the group that includes 1. as the first group in the comparison.

Returning to our mtcars example, if we want manual transmissions to be the first group in the comparison so we get a positive t value for the alternate hypothesis h_alt: mpg(manual) > mpg(automatic) we could use the following code.

data(mtcars)
mtcars$trans <- ifelse(mtcars$am == 1,"1. manual","2. automatic")
t.test(mtcars$mpg ~ mtcars$trans,alt="greater")

...and the output:

> t.test(mtcars$mpg ~ mtcars$trans,alt="greater")

    Welch Two Sample t-test

data:  mtcars$mpg by mtcars$trans
t = 3.7671, df = 18.332, p-value = 0.0006868
alternative hypothesis: true difference in means between group 1. manual and group 2. automatic is greater than 0
95 percent confidence interval:
 3.913256      Inf
sample estimates:
   mean in group 1. manual mean in group 2. automatic 
                  24.39231                   17.14737 
Len Greski
  • 10,505
  • 2
  • 22
  • 33
  • The null hypothesis should be `mpg of manual transmission cars = mpg of automatic transmission cars` –  Mar 18 '19 at 22:34
  • 2
    @JasonBaik - the option `alt="less"` specifies a one tailed test. In a one tailed test the alternate hypothesis is mpg(manual) > mpg(automatic). Therefore, the null hypothesis is mpg(manual) <= mpg(automatic) because the rejection region is all on one side of the normal curve. For a two tailed test the null hypothesis is mpg(manual) = mpg(automatic), but we didn't conduct a two tailed test. My answer uses a one tailed test because the OP asked a question that is best answered with a one tailed t-test, not a two tailed test. – Len Greski Mar 18 '19 at 22:43
  • 1
    Is there a flag to compute a two-tailed contrast using inverse alphabetical order, or a specified order? It would be nice to be able to specify whether it does "a - b" or "b - a", because that impacts the sign of the t-value and the confidence intervals of the mean differences. – Kayle Sawyer Sep 29 '21 at 00:54
  • @KayleSawyer Did you find a solution to reverse the order in which the groups are compared? I have the same issue with groups split into "0" and "1" but I need to keep them grouped as such for later analyses. I tried fct_rev() on them, but it didn't change the sign on the mean difference or CI – Cassandra Nov 11 '21 at 18:48
  • @Cassandra -- see my updated answer for a technique that changes the order of comparison. – Len Greski Nov 12 '21 at 19:53
  • @Cassandra I ended up recoding the levels of my grouping variable to numbers, like Len Greski suggests. You could also append "A. " and "B. " in front. Then you can recode them back after obtaining the t statistics, for later analyses. – Kayle Sawyer Nov 12 '21 at 21:46