0

I'm trying to perform z-tests in R for equality of proportions for data with several groups. Following the bottom-most suggestion on this SO post, I have been attempting to use map2 from purrr to compare the proportion of cases between males and females separately for each condition and for children and adults. That is, I want to compare the proportion of cases between boys and girls with condition A, between men and women with condition A, between boys and girls with condition B, etc.

In the sample data, cond_total is the number of cases for each combination of (male x adult x condition), and total is the total number of people for each combination of (male x adult x condition).

library(tidyverse)
library(broom)
library(magrittr)

# sample data

male <- factor(c(0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1))
adult <- factor(rep(0:1, 10))
condition <- factor(rep(c("A", "B", "C", "D", "E"), each = 4))
cond_total <- c(sample(25:40, 20, replace = T))
total <- c(sample(50:200, 20, replace = T))

df <- data.frame(male, adult, condition, cond_total, total)

When I directly follow the SO example linked above, the function runs, but I'm clearly not correctly incorporating the comparison structure that I want, because I get a single test of proportions for each unique combination of variables:

df %>% mutate(zvalue = map2(cond_total, total, ~ prop.test(.x, .y) %>% tidy())) %>% unnest(zvalue)

The same thing happens if I try to use group_by to indicate that each (adult x condition) group should be compared between the two gender groups.

df %>% group_by(male) %>% mutate(zvalue = map2(cond_total, total, ~ prop.test(.x, .y) %>% tidy())) %>% unnest(zvalue)

If I try to use nest() to introduce the grouping structure (which I think I'm doing correctly, but perhaps not), I get two different error messages. On my home computer, I get "Error: column zvalue must be length 10 (the number of rows) or one, not 20." On my work computer, I get "Error: object cond_total not found."

df %>% nest(-c(adult, condition)) %>% mutate(zvalue = map2(cond_total, total, ~ prop.test(.x, .y) %>% tidy())) %>% unnest(zvalue)

I have also looked at this SO post and am trying to figure out if my problem is exactly the same - the complexity of the solution described there was very surprising to me, and I'm not sure if it's what I need.

Session info from my home computer:

R version 3.6.1 (2019-07-05) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS 10.16

Session info from my work computer:

R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17763)

Kellan Baker
  • 375
  • 3
  • 11
  • IN the nest, are you grouping by adult, condition? – akrun Feb 02 '21 at 23:58
  • What is `zvalue`? in your code doesnt appear. – Juan Camilo Rivera Palacio Feb 03 '21 at 00:00
  • Do you need `df %>% nest_by(male) %>% mutate(value = list(prop.test(data$cond_total, data$total) %>% tidy)) %>% ungroup %>% select(male, value) %>% unnest(c(value))` – akrun Feb 03 '21 at 00:03
  • @JuanCamiloRiveraPalacio I fixed the typo. Zvalue is just the variable name for the tidied results of the prop.test. – Kellan Baker Feb 03 '21 at 00:20
  • @akrun not quite - looking at the output of your suggestion, I need the prop.test comparison between `estimate1` (adult males with condition A) and `estimate3` (adult females with condition A), between `estimate2` (young males with condition A) and `estimate4` (young females with condition A), etc. – Kellan Baker Feb 03 '21 at 00:22
  • @akrun so I can get what I need by manually extracting the indices from the ordered data frame: `prop.test(c(df$cond_total[1], df$cond_total[3]), c(df$total[1], df$total[3]))`, `prop.test(c(df$cond_total[2], df$cond_total[4]), c(df$total[2], df$total[4]))`, etc., but this is such an error-prone approach and obviously shows how limited my programming skills are. – Kellan Baker Feb 03 '21 at 15:41

1 Answers1

2

You can run a series of prop.test() analyses using gtsummary package.

library(gtsummary)
packageVersion("gtsummary")
#> ‘1.3.6’

trial %>%
  select(response, death, trt) %>%
  tbl_summary(
    by = trt,
    missing = "no",
    statistic = everything() ~ "{p}%"
  ) %>%
  add_difference(test = everything() ~ "prop.test") %>%
  modify_footnote(all_stat_cols() ~ NA)

enter image description here

Dharman
  • 30,962
  • 25
  • 85
  • 135
Daniel D. Sjoberg
  • 8,820
  • 2
  • 12
  • 28