1

Consider the following dataframe:

df1
#   bacteria           sample     Number_x          Number_y    
#1        A           HM_001          100                30
#2        B           HM_001           50                60
#3        C           HM_001          300                10
#4        D        A2_HM_001          400                20
#5        E        A2_HM_001           22                11
#6        F           HM_002           23                35
#7        G           HM_002          120                46
#8        H           HM_003           50                51
# … with 1,342 more rows

Grouped by samples, I wish to perform a row-wise two-sided Fisher exact test for each bacteria. (e.g. HM_001 is shown below).

HM_001 Number_x Number_y
A 100 30
Others (B and C in this case) 350 70
HM_001 Number_x Number_y
B 50 60
Others (A and C in this case) 400 40

and so forth, essentially generating a p-value for each of the 1350 rows in the dataframe.

Below is my attempt:

Fisher_result <- df1 %>%   
  group_by(sample) %>% 
  row_wise_fisher_test(as.matrix(df1[,c(3,4)]), p.adjust.method = "BH")

But it didn't work, outputing the following error message:

Error in row_wise_fisher_test(., as.matrix(df1[, c(3, 4)]),  : 
  A cross-tabulation with two columns required

Any pointers will be greatly appreciated!

JujutsuR
  • 81
  • 6

1 Answers1

2

You can group_by each sample and apply row_wise_fisher_test to each group and use unnest to bring them in separate columns.

library(dplyr)
library(tidyr)
library(rstatix)

df1 %>%
  group_by(sample) %>%
  summarise(data = list(row_wise_fisher_test(as.matrix(select(cur_data(), 
                        starts_with('Number'))), p.adjust.method = "BH"))) %>%
  unnest_wider(data) %>%
  unnest(c(group:p.adj.signif)) -> Fisher_result

Fisher_result

# sample    group     n        p    p.adj p.adj.signif
#  <chr>     <chr> <int>    <dbl>    <dbl> <chr>       
#1 A2_HM_001 1       453 1.73e- 6 1.73e- 6 ****        
#2 A2_HM_001 2       453 1.73e- 6 1.73e- 6 ****        
#3 HM_001    1       550 1.18e- 1 1.18e- 1 ns          
#4 HM_001    2       550 9.31e-24 1.40e-23 ****        
#5 HM_001    3       550 1.57e-26 4.71e-26 ****        
#6 HM_002    1       224 1.44e- 5 1.44e- 5 ****        
#7 HM_002    2       224 1.44e- 5 1.44e- 5 ****        
#8 HM_003    1       101 1.00e+ 0 1.00e+ 0 ns         

data

df1 <- structure(list(bacteria = c("A", "B", "C", "D", "E", "F", "G", 
"H"), sample = c("HM_001", "HM_001", "HM_001", "A2_HM_001", "A2_HM_001", 
"HM_002", "HM_002", "HM_003"), Number_x = c(100L, 50L, 300L, 
400L, 22L, 23L, 120L, 50L), Number_y = c(30L, 60L, 10L, 20L, 
11L, 35L, 46L, 51L)), class = "data.frame", row.names = c(NA, -8L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • `Error: Problem with `summarise()` input `data`. x A cross-tabulation with two columns required ℹ Input `data` is `list(...)`. ℹ The error occurred in group 1: sample = "HM_001".` this error prompted – JujutsuR Feb 16 '21 at 10:22
  • @JujutsuR Can you provide a reproducible example using which I can reproduce the error that you get? – Ronak Shah Feb 16 '21 at 10:44
  • My dataframe is exactly like my posted question; just that instead of having Number_x and Number_y, I have the last two columns named Reads_Community_T and Reads Community_N respectively. – JujutsuR Feb 16 '21 at 10:51
  • `df1 %>% group_by(sample) %>% summarise(data = list(row_wise_fisher_test(as.matrix(select(cur_data(), starts_with('Reads_Community_T'))), p.adjust.method = "BH"))) %>% unnest_wider(data) %>% unnest(c(group:p.adj.signif)) -> Fisher_result Fisher_result` – JujutsuR Feb 16 '21 at 10:53
  • 1
    @JujutsuR You should use `starts_with('Reads_Community')` and not `starts_with('Reads_Community_T')` . I also included the data that I used and it works for me without any error on that data as shown. – Ronak Shah Feb 16 '21 at 10:55
  • is it possible that in the output data you also include the original column for each bacteria present from the original dataframe? – JujutsuR Feb 16 '21 at 11:04
  • I guess as we're doing it row wise, a `cbind()` should do the trick – JujutsuR Feb 16 '21 at 11:14
  • `cbind` would work as well but we can also include the code in the same pipe. Something like this. `df1 %>% group_by(sample) %>% summarise(data = list(row_wise_fisher_test(as.matrix(select(cur_data(), starts_with('Number'))), p.adjust.method = "BH")),bacteria = list(bacteria)) %>% unnest_wider(data) %>% unnest(c(group:p.adj.signif, bacteria))`. – Ronak Shah Feb 16 '21 at 11:17