1

Data:

variant disease control total
A1         1      53    54
A2         6      2     8
A3         15     37    52
A4         0      53    53
A5         65     4     69
A6         4      5     9
A7         3      34    37

I would like to add a row at the bottom with column totals for the disease and control ones and then run a fishers per row adding another column with p-values from the test.

Desired outcome (p-values made up):

variant disease control total p-value
A1         1      53    54    0.001
A2         6      2     8     0.6921
A3         15     37    52    1
A4         0      53    53    0.98
A5         65     4     69    0.68
A6         4      5     9     0.63
A7         3      34    37    0.832
C_total    94     188

I've tried:

rbind(df, colSums(df[,2:3]), fill=TRUE) 

But this give me all the column totals in the final two columns

Not sure about the Fishers yet but imagine some form of apply function using per row and per total to create a 2x2 table.

Many thanks

tacrolimus
  • 500
  • 2
  • 12

2 Answers2

1

For the first of your questions:

rbind(df, rbind(colSums(df[,2:3])), fill = TRUE)[ (.N == seq_len(.N)), variant := "Total"][]
#    variant disease control total p-value
# 1:      A1       1      53    54  0.0010
# 2:      A2       6       2     8  0.6921
# 3:      A3      15      37    52  1.0000
# 4:      A4       0      53    53  0.9800
# 5:      A5      65       4    69  0.6800
# 6:      A6       4       5     9  0.6300
# 7:      A7       3      34    37  0.8320
# 8:   Total      94     188    NA      NA
r2evans
  • 141,215
  • 6
  • 77
  • 149
1

One dplyr and tibble solution could be:

df %>%
 add_row(variant = "Total", !!!colSums(df[-1])) %>%
 rowwise() %>%
 mutate(p_value = chisq.test(c_across(c(disease, control)), p = c(0.5, 0.5))$p.value)

  variant disease control total  p_value
  <chr>     <dbl>   <dbl> <dbl>    <dbl>
1 A1            1      53    54 1.48e-12
2 A2            6       2     8 1.57e- 1
3 A3           15      37    52 2.28e- 3
4 A4            0      53    53 3.34e-13
5 A5           65       4    69 2.08e-13
6 A6            4       5     9 7.39e- 1
7 A7            3      34    37 3.46e- 7
8 Total        94     188   282 2.17e- 8

And as I suppose you try to compare whether the count of individuals between the two groups is the same, a chi-square goodness of fit test could be used.

tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • thanks for taking the time to answer. I think what is actually needed are two further columns: per variant disease then disease without variant (e.g. for variant A1 that would be 93 i.e. 94-1) and then control and control without the variant. Then run a fishers per rows across that data if that makes sense? – tacrolimus Oct 15 '20 at 21:13
  • What are you trying to test? – tmfmnk Oct 15 '20 at 21:16
  • per variant if there is an enrichment of "disease" cases versus "control" cases. The number represent the number of people with the variant from each cohort. – tacrolimus Oct 15 '20 at 21:17
  • Then a chi-square goodness of fit test should be appropriate as you are testing whether two counts are the same. – tmfmnk Oct 15 '20 at 21:20