0

I have a dataset with two variables (x1 and x2) from many firms which belong to different industry groups. I calculate the variable "test1" for about 500 firms. We are given the follwing code:

 df$test1 <- df$x1 - df$x2

library(broom)
result.test <- df %>% 
  group_by(industry) %>% do(tidy(t.test(.$test1, alt="two.sided", mu=0)))

The results are grouped by "industry" but it's not clear for me how the t test proceeds. Is the t-test performed for each variable "test1" and then the average result presented in industry group or is the average of "test1" determined for each industry group and then the t-test performed?

  • I'm a little unclear on your question. There is only one `test1` variable in the data set, so I don't know what you mean by "for each variable `test1`" ... ? – Ben Bolker Aug 30 '21 at 16:09
  • There are 500 companies in my data set. The variable "test1" is calculated for each company. I updated my question – newbie090909 Aug 30 '21 at 16:12

2 Answers2

1

So the t test is applied for a subset of each level of industry, here is an example with mtcars:

library(broom)
result.test <-
  mtcars %>% 
  group_by(cyl) %>%
  do(tidy(t.test(.$drat, alt="two.sided", mu=0)))

# A tibble: 3 x 9
# Groups:   cyl [3]
    cyl estimate statistic  p.value parameter conf.low conf.high method            alternative
  <dbl>    <dbl>     <dbl>    <dbl>     <dbl>    <dbl>     <dbl> <chr>             <chr>      
1     4     4.07      36.9 5.03e-12        10     3.83      4.32 One Sample t-test two.sided  
2     6     3.59      19.9 1.04e- 6         6     3.15      4.03 One Sample t-test two.sided  
3     8     3.23      32.4 7.93e-14        13     3.01      3.44 One Sample t-test two.sided  

Now,I will filter just for cyl = 4

mtcars %>% 
  filter(cyl == 4) %>% 
  do(tidy(t.test(.$drat, alt="two.sided", mu=0)))

  estimate statistic  p.value parameter conf.low conf.high method            alternative
     <dbl>     <dbl>    <dbl>     <dbl>    <dbl>     <dbl> <chr>             <chr>      
1     4.07      36.9 5.03e-12        10     3.83      4.32 One Sample t-test two.sided 

And I got the same result, so it is like applying a t test for each subset of each level of the variable grouped by

Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32
0

We may also use nest_by

library(dplyr)
library(tidyr)
library(broom)
mtcars %>%
    nest_by(cyl) %>%
    transmute(out = list(tidy(t.test(data$drat, alt = 'two.sided', 
         mu = 0)))) %>% 
    ungroup %>% 
    unnest(out)

-output

# A tibble: 3 x 9
    cyl estimate statistic  p.value parameter conf.low conf.high method            alternative
  <dbl>    <dbl>     <dbl>    <dbl>     <dbl>    <dbl>     <dbl> <chr>             <chr>      
1     4     4.07      36.9 5.03e-12        10     3.83      4.32 One Sample t-test two.sided  
2     6     3.59      19.9 1.04e- 6         6     3.15      4.03 One Sample t-test two.sided  
3     8     3.23      32.4 7.93e-14        13     3.01      3.44 One Sample t-test two.sided  
akrun
  • 874,273
  • 37
  • 540
  • 662