Run multiple chisq-tests on one dataset with purrr

Question

I'm very new to the world of R. I have the following test data:

A<-tibble(parasite=sample(0:1,10,rep=TRUE),L1=sample(0:1,10,rep=TRUE),
L2=sample(0:1,10,rep=TRUE),L3=sample(0:1,10,rep=TRUE), 
L4=sample(0:1,10,rep=TRUE))

Looks like:

   parasite L1 L2 L3 L4 
1         0  0  1  0  0 
2         1  0  1  1  1 
3         1  1  1  0  1 
4         0  1  1  1  0 
5         1  1  1  1  0 
...10 rows total

What I want to do is to run 4 chisq tests:

1.parasite vs L1

2.parasite vs L2

3.parasite vs L3

4.parasite vs L4

I want to then produce a summary tibble that lists the Y component of each table (L1,L2...), chisq values, and pvalues (rounded to a reasonable extent) of the tests. Like:

variable  chisq  pvalue 
L1        1.475    0.0892 
L2       18.453    0.0000E8 
L3        2.4781   0.0012 
L4        0.6785   0.2755

I've seen the use of map to do something similar but I can't get it to work, but since I'm learning, any concise method of doing this would be greatly appreciated.

e.g.

map(~chisq.test(.x, data$column)) %>% 
  tibble(names = names(.), data = .) %>% 
  mutate(stats = map(data, tidy)) 
unnest(data,stats)

Can anyone show me how to do this?

Thanks!

If you use `sample()` in your test data, you should be sure to use `set.seed()` to make it reproducible. Otherwise it's hard to make sure we get the same value you expect. — MrFlick, May 11 '18 at 20:08

score 2 · Answer 1 · answered May 11 '18 at 20:20

Here's one way: get the data into a long shape, use do on a grouped data frame to call chisq.test, then tidy that output using broom.

library(tidyverse)

set.seed(1)
A <-tibble(parasite=sample(0:1,10,rep=TRUE),
                     L1=sample(0:1,10,rep=TRUE),
                    L2=sample(0:1,10,rep=TRUE),
                    L3=sample(0:1,10,rep=TRUE), 
                    L4=sample(0:1,10,rep=TRUE))

A %>%
    gather(key = variable, value = value, -parasite) %>%
    group_by(variable) %>%
    do(chisq.test(.$parasite, .$value) %>% broom::tidy())
#> # A tibble: 4 x 5
#> # Groups:   variable [4]
#>   variable statistic p.value parameter method                             
#>   <chr>        <dbl>   <dbl>     <int> <chr>                              
#> 1 L1        0.         1             1 Pearson's Chi-squared test         
#> 2 L2        2.93e-32   1.000         1 Pearson's Chi-squared test with Ya…
#> 3 L3        0.         1             1 Pearson's Chi-squared test         
#> 4 L4        2.34e- 1   0.628         1 Pearson's Chi-squared test with Ya…

Created on 2018-05-11 by the reprex package (v0.2.0).

score 0 · Answer 2 · answered May 11 '18 at 20:19

It's probably better to reshape your data into a long (tidy) format, then you can use nest() to perform the test in groups. For example

A %>% 
  gather("variable", "measure", -parasite) %>% 
  group_by(variable)%>% 
  nest(-variable) %>% 
  mutate(stats = map(data, ~broom::tidy(chisq.test(.$parasite, .$measure)))) %>% 
  select(-data) %>% 
  unnest()

Alternative you can use a do()

A %>% 
  gather("variable", "measure", -parasite) %>% 
  group_by(variable) %>% 
  do(broom::tidy(chisq.test(.$parasite, .$measure)))

Thanks so much for the help! Looks like I was way off in my approach. Now to do some homework on the new functions I just encountered. — Fishguy, May 14 '18 at 12:30

Run multiple chisq-tests on one dataset with purrr

2 Answers2