3

I have grouped data I'm performing a chi-squared test on and would like to returned a summary table that includes multiple values from the htest object. For example (from a previous question),

library(dplyr)

set.seed(1)
foo <- data.frame(
  partido=sample(c("PRI", "PAN"), 100, 0.6),
  genero=sample(c("H", "M"), 100, 0.7), 
  GM=sample(c("Bajo", "Muy bajo"), 100, 0.8)
)

foo %>% 
  group_by(GM) %>% 
  summarise(p.value=chisq.test(partido, genero)$p.value))

returns the p-value, but instead I would like multiple values (say p.value and statistic) from the htest object to be returned as different columns in the summary table.

I've tried

foo %>%
  group_by(GM) %>%
  summarise(htest=chisq.test(partido, genero)) %>%
  mutate(p.value=htest$p.value, statistic=htest$statistic)

but that throws an error

Error in summarise_impl(.data, dots) :
Column htest must be length 1 (a summary value), not 9

How do you accomplish this with the tidyverse tools?

merv
  • 67,214
  • 13
  • 180
  • 245

2 Answers2

4

Another option is to make use of broom::tidy

library(broom)
library(tidyverse)
foo %>%
    group_by(GM) %>%
    nest() %>%
    transmute(
        GM,
        res = map(data, ~tidy(chisq.test(.x$partido, .x$genero)))) %>%
    unnest()
## A tibble: 2 x 5
#  GM      statistic p.value parameter method
#  <fct>       <dbl>   <dbl>     <int> <chr>
#1 Bajo       0.0157   0.900         1 Pearson's Chi-squared test with Yates' c…
#2 Muy ba…    0.504    0.478         1 Pearson's Chi-squared test with Yates' c…
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
1

One way would be to nest the data by group (GM) and then use map to get different values from each group.

library(tidyverse)

foo %>%
  group_by(GM) %>%
  nest(partido, genero) %>%
  ungroup() %>%
  mutate(p.value = map_dbl(data, ~ chisq.test(.$partido,.$genero)$p.value), 
        statistic = map_dbl(data, ~ chisq.test(.$partido,.$genero)$statistic)) %>%
  select(-data)

#    GM       p.value statistic
#  <fct>      <dbl>     <dbl>
#1 Bajo       0.900    0.0157
#2 Muy bajo   0.478    0.504 

Or if we want to run the test only once, we can store the object in one variable and extract the values of interest.

foo %>%
  group_by(GM) %>%
  nest(partido, genero) %>%
  ungroup() %>%
  mutate(obj = map(data, ~ chisq.test(.$partido,.$genero)), 
         p.value = map_dbl(obj, ~ .$p.value), 
         statistic = map_dbl(obj, ~ .$statistic)) %>%
  select(-data, -obj)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • But that's running the test on every group twice, isn't it? One could do exactly the same by adding the argument `statistic=chisq.test(partido, genero)$statistic` to the `summarise()` call. – merv Feb 02 '19 at 03:58
  • @merv you are right. I have updated the answer which runs the test only once in every group. Though, I am not sure if this is the best way to do this. – Ronak Shah Feb 02 '19 at 04:02