How to get Mean SD and Pvalue for multiple groups in r?

Question

My data frame looks like this:

category    calss   test1   test2
1            Yes    5.5     4.2
1             No    5.8     4.3
1            Yes    6.6     3.2
2            Yes    6       7.7
2             No    5.7     5.8
3             No    9.7     4.5
3            Yes    6.8     8.5
2             No    6.3     9.6
3            Yes    8.5     2.6

I want to calculate the mean, SD, and p values (between test1 and test2) base on class and category respectively.

I used dplyr to calculate mean and SD and now I am struggling to calculate the p value, as my dataset contains 1000 lines, 4 different categories, and 8 classes.

Here is what I get after using dplyr for the mean and sd:

category    class   test1_Mean  test1_SD    test2_Mean  test2_SD
1            Yes    6              1             3.7    1.1
1             No    5.8             0            4.3    0
2            Yes    9.6             0            4.4    0
2             No     6             1.1           7.7    1
3            Yes    7.6            0.5           5.5    0.8
3             No    9.7             0            4.5    0

The output I want is:

category    class   test1_Mean  test1_SD    test2_Mean  test2_SD    Pvalue
1            Yes           6    1            3.7         1.1        0.05
1            No           5.8   0            4.3           0        0.14
2            Yes          9.6   0            4.4           0        0.69
2            No             6   1.1          7.7           1       0.001
3            Yes          7.6   0.5          5.5         0.8    2.00E+05
3            No           9.7   0            4.5           0        0.04

Thanks in advance.

Not sure what you mean by Pvalue, are you performing any statistical tests such as two sample T-test? — Karthik S, Oct 31 '20 at 09:50

score 2 · Accepted Answer · answered Oct 31 '20 at 10:01

2

You can try :

library(dplyr)
df %>%
  group_by(category, calss) %>%
  summarise(pvalue = t.test(test1, test2)$p.value)

answered Oct 31 '20 at 10:01

Ronak Shah

377,200
20
156
213

score 0 · Answer 2 · answered Oct 31 '20 at 10:15

0

I think this what you are looking for:

library(dplyr)
df %>% group_by(category, class) %>%
  summarise(test1_mean=mean(test1), test2_mean=mean(test2), test1_SD=sd(test1), test2_SD=sd(test2), pvalue = t.test(test1, test2)$p.value)

answered Oct 31 '20 at 10:15

Wahiduzzaman Khan

155
8

score 0 · Answer 3 · answered Oct 31 '20 at 19:26

0

An option with data.table

library(data.table)
setDT(df)[, .(pvalue = t.test(test1, test2)$p.value), .(category, calss)]

answered Oct 31 '20 at 19:26

akrun

874,273
37
540
662

How to get Mean SD and Pvalue for multiple groups in r?

3 Answers3