0

I have a data frame similar to this:

df1 <- data.frame(c(31,3447,12,1966,39,3275),
                  c(20,3460,10,1968,30,3284),
                  c(334,3146,212,1766,338,2976),
                  c(36,3442,35,1943,47,3267),
                  c(81,3399,71,1907,112,3202),
                  c(22,3458,22,1956,42,3272))

colnames(df1) <- c("Site1.C1","Site1.C2","Site2.C1","Site2.C2","Site3.C1","Site3.C2")

df1
  Site1.C1 Site1.C2 Site2.C1 Site2.C2 Site3.C1 Site3.C2
1       31       20      334       36       81       22
2     3447     3460     3146     3442     3399     3458
3       12       10      212       35       71       22
4     1966     1968     1766     1943     1907     1956
5       39       30      338       47      112       42
6     3275     3284     2976     3267     3202     3272

I am converting each row into a table and then performing a chisq test.
In order get specific values from the chisq result (p value, parameter, statistic, expected, etc), I'm having to repeat chisq test several times over (in a very ugly and cumbersome way), using the following code:

df2 <- df1 %>% rowwise() %>% mutate(P=chisq.test(rbind(c(Site1.C1,Site1.C2),c(Site2.C1,Site2.C2),c(Site3.C1,Site3.C2)))$p.value,
                                df=chisq.test(rbind(c(Site1.C1,Site1.C2),c(Site2.C1,Site2.C2),c(Site3.C1,Site3.C2)))$parameter,
                                Site1.c1.exp=chisq.test(rbind(c(Site1.C1,Site1.C2),c(Site2.C1,Site2.C2),c(Site3.C1,Site3.C2)))$expected[1,1],
                                Site1.c2.exp=chisq.test(rbind(c(Site1.C1,Site1.C2),c(Site2.C1,Site2.C2),c(Site3.C1,Site3.C2)))$expected[1,2],
                                Site2.c1.exp=chisq.test(rbind(c(Site1.C1,Site1.C2),c(Site2.C1,Site2.C2),c(Site3.C1,Site3.C2)))$expected[2,1],
                                Site2.c2.exp=chisq.test(rbind(c(Site1.C1,Site1.C2),c(Site2.C1,Site2.C2),c(Site3.C1,Site3.C2)))$expected[2,2],
                                Site3.c1.exp=chisq.test(rbind(c(Site1.C1,Site1.C2),c(Site2.C1,Site2.C2),c(Site3.C1,Site3.C2)))$expected[3,1],
                                Site3.c2.exp=chisq.test(rbind(c(Site1.C1,Site1.C2),c(Site2.C1,Site2.C2),c(Site3.C1,Site3.C2)))$expected[3,2])

as.data.frame(df2)

  Site1.C1 Site1.C2 Site2.C1 Site2.C2 Site3.C1 Site3.C2            P df Site1.c1.exp Site1.c2.exp Site2.c1.exp Site2.c2.exp Site3.c1.exp Site3.c2.exp
1       31       20      334       36       81       22 2.513166e-08  2     43.40840     7.591603     314.9237     55.07634     87.66794     15.33206
2     3447     3460     3146     3442     3399     3458 2.760225e-02  2   3391.05464  3515.945362    3234.4387   3353.56132   3366.50668   3490.49332
3       12       10      212       35       71       22 4.743725e-04  2     17.92818     4.071823     201.2845     45.71547     75.78729     17.21271
4     1966     1968     1766     1943     1907     1956 1.026376e-01  2   1928.02242  2005.977577    1817.7517   1891.24831   1893.22588   1969.77412
5       39       30      338       47      112       42 2.632225e-10  2     55.49507    13.504934     309.6464     75.35362    123.85855     30.14145
6     3275     3284     2976     3267     3202     3272 2.686389e-02  2   3216.55048  3342.449523    3061.5833   3181.41674   3174.86626   3299.13374

Is there a more elegant way to do chisq test just once and capture the result as a tibble in the same row and then extract values on a need-to basis into additional columns?
My data frame has over a million of rows and some additional variables not used with the Chisq test.

Thank you.

1 Answers1

0

With input from @akrun, I was able to get the desired result using the following code:

df2 <- df1 %>% rowwise() %>% mutate(result=list(chisq.test(rbind(c(Site1.C1,Site1.C2),c(S‌​ite2.C1,Site2.C2),c(‌​Site3.C1,Site3.C2)))‌​))