How to extract pvalue for each column in r?

Question

I have a dataframe as follows:

df = structure(list(aa = c(1L, 5L, 8L, 10L, 1L, 10L, 8L, 6L, 7L, 4L, 
1L, 5L, 7L, 7L, 5L, 8L), bb = c(2L, 9L, 1L, 10L, 8L, 7L, 10L, 
8L, 1L, 7L, 2L, 10L, 3L, 5L, 2L, 10L), cc = c(1L, 5L, 9L, 4L, 
9L, 1L, 8L, 3L, 2L, 2L, 2L, 5L, 7L, 2L, 2L, 3L), dd = c(10L, 
5L, 8L, 10L, 6L, 8L, 7L, 5L, 2L, 9L, 10L, 6L, 5L, 3L, 7L, 8L), 
    ee = c(5L, 7L, 5L, 1L, 8L, 4L, 5L, 2L, 10L, 6L, 8L, 10L, 
    6L, 5L, 10L, 6L), Group = c("High", "High", "High", "High", 
    "High", "High", "High", "High", "Low", "Low", "Low", "Low", 
    "Low", "Low", "Low", "Low")), class = "data.frame", row.names = c(NA, 
-16L))

I want to calculate pvalue for each column based on the Group mentioned in the table. my expected output is:

values  pvalue  t        mean in High     mean in Low 
aa      0.08    0.41523  6.8              5
bb      0.89    1.41523  6.8              4
cc      0.088   2.41523  2.3              8
dd      0.89    3.41523  9.6              2
ee      0.76    4.41523  4.3              5

I tried following code to generate the pvalue:

# Compute t-test
res <- t.test(aa ~ Group, data = df)
res

It results as:

    Welch Two Sample t-test

data:  aa by Group
t = 0.41523, df = 11.794, p-value = 0.6854
alternative hypothesis: true difference in means between group High and group Low is not equal to 0
95 percent confidence interval:
 -2.660919  3.910919
sample estimates:
mean in group High  mean in group Low 
             6.125              5.500

Onyambu · Accepted Answer · 2023-02-28T07:38:58.910

want <- c('p.value','estimate', 'statistic')
t(sapply(head(names(df),-1),\(x)unlist(t.test(reformulate('Group', x), df)[want])))

     p.value estimate.mean in group High estimate.mean in group Low statistic.t
aa 0.6854296                       6.125                      5.500   0.4152274
bb 0.3093107                       6.875                      5.000   1.0550233
cc 0.1938533                       5.000                      3.125   1.3833764
dd 0.3738283                       7.375                      6.250   0.9219951
ee 0.0177543                       4.625                      7.625  -2.6880860


pivot_longer(df,-Group) %>%
   group_by(name)%>%
   summarise(mod = list(unlist(t.test(value~Group)[want])))%>%
   unnest_wider(mod)

# A tibble: 5 × 5
  name  p.value `estimate.mean in group High` `estimate.mean in group Low` statistic.t
  <chr>   <dbl>                         <dbl>                        <dbl>       <dbl>
1 aa     0.685                           6.12                         5.5        0.415
2 bb     0.309                           6.88                         5          1.06 
3 cc     0.194                           5                            3.12       1.38 
4 dd     0.374                           7.38                         6.25       0.922
5 ee     0.0178                          4.62                         7.62      -2.69

score 0 · Answer 2 · answered Feb 28 '23 at 06:47

setNames(
  do.call(
    rbind.data.frame,
    sapply(
      1:(ncol(df)-1),
      function(x){
        tmp=t.test(df[,x]~df$Group)
        data.frame(c(colnames(df)[x],tmp$p.value,tmp$statistic,tmp$estimate))
      }
    )
  ),
  c("values","pvalue","t","mean in High","mean in Low ")
)

  values             pvalue                 t mean in High mean in Low 
1     aa  0.685429645277524   0.4152273992687        6.125          5.5
2     bb  0.309310704596943  1.05502331962237        6.875            5
3     cc  0.193853298306977  1.38337639677856            5        3.125
4     dd  0.373828335118548 0.921995098966768        7.375         6.25
5     ee 0.0177542973016432 -2.68808602012899        4.625        7.625

How to extract pvalue for each column in r?

2 Answers2