0

My data frame looks like this:

category    calss   test1   test2
1            Yes    5.5     4.2
1             No    5.8     4.3
1            Yes    6.6     3.2
2            Yes    6       7.7
2             No    5.7     5.8
3             No    9.7     4.5
3            Yes    6.8     8.5
2             No    6.3     9.6
3            Yes    8.5     2.6

I want to calculate the mean, SD, and p values (between test1 and test2) base on class and category respectively.

I used dplyr to calculate mean and SD and now I am struggling to calculate the p value, as my dataset contains 1000 lines, 4 different categories, and 8 classes.

Here is what I get after using dplyr for the mean and sd:

category    class   test1_Mean  test1_SD    test2_Mean  test2_SD
1            Yes    6              1             3.7    1.1
1             No    5.8             0            4.3    0
2            Yes    9.6             0            4.4    0
2             No     6             1.1           7.7    1
3            Yes    7.6            0.5           5.5    0.8
3             No    9.7             0            4.5    0

The output I want is:

category    class   test1_Mean  test1_SD    test2_Mean  test2_SD    Pvalue
1            Yes           6    1            3.7         1.1        0.05
1            No           5.8   0            4.3           0        0.14
2            Yes          9.6   0            4.4           0        0.69
2            No             6   1.1          7.7           1       0.001
3            Yes          7.6   0.5          5.5         0.8    2.00E+05
3            No           9.7   0            4.5           0        0.04

Thanks in advance.

Shawn Hemelstrand
  • 2,676
  • 4
  • 17
  • 30
Rebel_47
  • 69
  • 4

3 Answers3

2

You can try :

library(dplyr)
df %>%
  group_by(category, calss) %>%
  summarise(pvalue = t.test(test1, test2)$p.value)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

I think this what you are looking for:

library(dplyr)
df %>% group_by(category, class) %>%
  summarise(test1_mean=mean(test1), test2_mean=mean(test2), test1_SD=sd(test1), test2_SD=sd(test2), pvalue = t.test(test1, test2)$p.value)
0

An option with data.table

library(data.table)
setDT(df)[, .(pvalue = t.test(test1, test2)$p.value), .(category, calss)]
akrun
  • 874,273
  • 37
  • 540
  • 662