1

I want to cross tabulate member and author in the rows and review, publish and pay in the column showing row and column total with percentages in bracket and chi-square test in the footnote.

#data
set.seed(123)
member <- sample(c("Yes", "No"), 100, replace = TRUE)
author <- sample(c("Yes", "No"), 100, replace = TRUE)
review <- sample(0:10, 100, replace = TRUE)
publish <- sample(0:10, 100, replace = TRUE)
pay <- sample(0:10, 100, replace = TRUE)
data <- data.frame(member, author, review, publish, pay)

But I recently found out about gtsummary which will produce the result I want but I'm struggling to replicate the result - so far with the tidy code I have this: I want review, publish and pay to be grouped by No (0-4), Maybe (5) and Yes (6-10) as shown in the code below. So far I have used tidyverse:

data |>
  group_by(member)|>
  summarise(
    Disagree = sum(review<5),
    Neutral = sum(review==5),
    Agree = sum(review>5))|>
  kbl(caption = "Review by member") %>%
  kable_paper("hover",full_width = F,html_font = "Cambria")
fisher.test(table(data$member, data$review),simulate.p.value = T)

Thanks for your help. I could not post the image because I need 10 reputation (I don't know what that means)

The preferred output is have review, publish and pay has three columns with groups No, Maybe, Yes.

Cherson
  • 13
  • 3

1 Answers1

0

Update: We could add use tbl_split(., c(author, review_group, publish_group, pay_group)) to the code:

Here you will get 4 separate tables that you could put side by side:

library(dplyr)
library(gtsummary)

data %>%
  mutate(across(c(review, publish, pay), ~cut(., breaks = c(-Inf, 4.5, 5.5, Inf),
                                              labels = c("No", "Maybe", "Yes"),
                                              include.lowest = TRUE), .names = "{.col}_group")) %>% 
  select(member, author, ends_with("group")) %>%
  tbl_summary(
    by = member,
    missing = "no", 
    statistic = list(all_categorical() ~ "{n} ({p}%)"),
    digits = list(all_categorical() ~ c(0, 1))
  ) %>%
  add_p(test = all_categorical() ~ "chisq.test") %>% 
  tbl_split(., c(author, review_group, publish_group, pay_group))

First answer: We could do it this way:

library(dplyr)
library(gtsummary)

data %>%
  mutate(across(c(review, publish, pay), ~cut(., breaks = c(-Inf, 4.5, 5.5, Inf),
                                              labels = c("No", "Maybe", "Yes"),
                                              include.lowest = TRUE), .names = "{.col}_group")) %>% 
  select(member, author, ends_with("group")) %>%
  tbl_summary(
    by = member,
    missing = "no", 
    statistic = list(all_categorical() ~ "{n} ({p}%)"),
    digits = list(all_categorical() ~ c(0, 1))
  ) %>%
  add_p(test = all_categorical() ~ "chisq.test")

enter image description here

TarJae
  • 72,363
  • 6
  • 19
  • 66
  • 1
    Dear TarJae, this is so good, thank you very much. Is it possible to have the result in a wide format i.e, "publish_group" and "pay_group" next to "review group"? Just to save space in the report – Cherson May 12 '23 at 17:26
  • 1
    Thank you for the update, I could also use tbl_merge to combine the results but that would mean running the code for each variable and merge. I will use the first one as is. Thanks again for your help. – Cherson May 12 '23 at 20:47
  • I just noticed the tbl_summary doesn't give the row and column totals like in tbl_cross, is there a way to tweak the code to get this result? – Cherson May 12 '23 at 22:46
  • I tried to fix it using this code ``` data %>% mutate(across(c(review, publish, pay), ~cut(., breaks = c(-Inf, 4.5, 5.5, Inf), labels = c("No", "Maybe", "Yes"), include.lowest = TRUE), .names = "{.col}_group")) %>% select(member, author, ends_with("group")) %>% tbl_cross( row = member, col = select( review_group, publish_group,pay_group), missing = "no", statistic = list(all_categorical() ~ "{n} ({p}%)"), digits = list(all_categorical() ~ c(0, 1)) ) ``` – Cherson May 12 '23 at 22:56