How do you include confidence intervals for proportions in gtsummary with by?

Question

enter image description hereI have tried adding the confidence intervals in gtsummry but I get an error #>Error: Dimension of 'a1' and the added statistic do not match. Expecting statistic to be length 2. I successfully managed to add the intervals when I don't stratified by any variable. The code is as below-sorry if its too verbose.

#---- Libraries
library(gtsummary)
library(tidyverse)


#---- Data

set.seed(2021)

df <- tibble(
  
  a1 = factor(ifelse(sign(rnorm(30))==-1, 0, 1), labels = c("No", "Yes")),
  a2 = factor(ifelse(sign(rnorm(30))==-1, 0, 1), labels = c("No", "Yes")),
  gender = gl(2, 15, labels = c("Males", "Females")),
  b2 = gl(3, 10, labels = c("Primary", "Secondary", "Tertiary")),
  c1 = gl(3, 10, labels = c("15-19", "20-24", "25-30")),
  outcome = factor(ifelse(sign(rnorm(30))==-1, 0, 1), labels = c("No", "Yes")),
  weight = runif(30, 1, 12)
)


#---- Function to calculate CIs

categorical_ci <- function(variable, tbl, ...) {
  
  filter(tbl$meta_data, variable == .env$variable) %>%
    pluck("df_stats", 1) %>%
    mutate(
      # calculate and format 95% CI
      prop_ci = map2(n, N, ~prop.test(.x, .y)$conf.int %>%
                       style_percent(symbol = TRUE)),
      ci = map_chr(prop_ci, ~glue::glue("{.x[1]}, {.x[2]}"))
    ) %>%
    pull(ci)
}



#---- tblsummary with stratified by gender

t1 <- df %>%
  select(gender, a1, a2) %>%
  tbl_summary(by = gender, statistic = everything() ~ "{n} {p}%",
              type = everything() ~ "categorical")


t1 %>%
  add_stat(
    fns = everything() ~ "categorical_ci",
    location = "level",
    header = "**95% CI**"
  ) %>%
  modify_footnote(everything() ~ NA)

Can you include a screenshot of what you'd like the final table to look like? — Daniel D. Sjoberg, Mar 31 '21 at 10:42
here is an example that uses a survey dataset. but the structure is the same for a regular data frame (you'll just need to update the `ci` function). https://stackoverflow.com/questions/66814238/using-gtsummary-tbl-svysummaary-function-to-display-confidence-intervals-for-s — Daniel D. Sjoberg, Mar 31 '21 at 11:00
@DanielD.Sjoberg I have added the image of the desired output for one variable. I would like to have them all the desired variables — Moses, Mar 31 '21 at 11:31

score 0 · Answer 1 · answered Mar 31 '21 at 11:39

0

There is a similar question here: https://community.rstudio.com/t/tbl-summary-function/100113/6

library(gtsummary)

ll <- function(x) t.test(x)$conf.int[[1]] # Lower 95% CI of mean
ul <- function(x) t.test(x)$conf.int[[2]] # Upper 95% CI of mean

# create table 1
table <-
  trial %>%
  select(trt, age) %>%
  tbl_summary(
    by = trt,
    statistic = all_continuous() ~ "{mean} ({ll} — {ul})",
    missing = "no",
    digits = all_continuous() ~ 2
  ) %>%
  modify_footnote(all_stat_cols() ~ "Mean (95% CI)")

answered Mar 31 '21 at 11:39

Daniel D. Sjoberg

8,820
2
12
28

Thank you @Daniell. It looks nice. The continuous variable works perfectly fine. The challenge is the categorical variable. Not sure whether it can be replicated to categorical variables – Moses Mar 31 '21 at 11:47
I am not sure what you want the result to look like. Here's an example from a previous post https://community.rstudio.com/t/how-can-i-add-a-confidence-interval-in-tbl-summary/90109/2 – Daniel D. Sjoberg Mar 31 '21 at 11:52
oh, I think the last example in this help file is a bit more clear than the link I sent:http://www.danieldsjoberg.com/gtsummary/reference/add_stat.html This does require the dev version of gtsummary, however – Daniel D. Sjoberg Mar 31 '21 at 12:30
Thank you @Daniel it looks nice. However, when I try to stratify it by treatment (` tbl_summary(missing = "no", by = trt `) , the CIs do not correctly align to the level of stratification. Sorry if I was not clear at first – Moses Mar 31 '21 at 12:42
to do this for each level of a categorical variable, you'll need a function that you pass to `add_stat()` that also takes into account the `by=` variable (like the one in this link https://stackoverflow.com/questions/66814238/using-gtsummary-tbl-svysummaary-function-to-display-confidence-intervals-for-s, but yours is more complex because of the multiple levels) – Daniel D. Sjoberg Mar 31 '21 at 12:45
Yes @Daniel I guess your last example now reflects what I requested. Where this error was generated #>Error: Dimension of 'grade' and the added statistic do not match. Expecting statistic to be length 3. – Moses Mar 31 '21 at 12:46

score 0 · Accepted Answer · answered Mar 31 '21 at 16:14

#---- Libraries
library(gtsummary)
library(flextable)
library(tidyverse)


#---- Data

set.seed(2021)

df <- tibble(

  a1 = factor(ifelse(sign(rnorm(30))==-1, 0, 1), labels = c("No", "Yes")),
  a2 = factor(ifelse(sign(rnorm(30))==-1, 0, 1), labels = c("No", "Yes")),
  gender = gl(2, 15, labels = c("Males", "Females")),
  b2 = gl(3, 10, labels = c("Primary", "Secondary", "Tertiary")),
  c1 = gl(3, 10, labels = c("15-19", "20-24", "25-30")),
  outcome = factor(ifelse(sign(rnorm(30))==-1, 0, 1), labels = c("No", "Yes")),
  weight = runif(30, 1, 12)
)


#---- Solution ----

tbl <-
  df %>%
  select(a1, a2, gender) %>%
  tbl_summary(missing = "no",  by = gender, type = everything() ~ "categorical",
              percent = "row") %>%
  add_n() %>%
  modify_footnote(everything() ~ NA)


myci <- tbl$meta_data %>%
  filter(summary_type %in% c("categorical", "dichotomous")) %>%
  select(summary_type, var_label, df_stats) %>%
  unnest(df_stats) %>%
  mutate(
    conf.low = (p - qnorm(0.975) * sqrt(p * (1 - p) / N)) %>%
      style_percent(symbol = TRUE),
    conf.high =( p + qnorm(0.975) * sqrt(p * (1 - p) / N)) %>%
      style_percent(symbol = TRUE),
    ci = str_glue("{conf.low}, {conf.high}"),
    label = coalesce(variable_levels, var_label),
    row_type = ifelse(summary_type == "dichotomous", "label", "level")
  ) %>%
  select(by, variable, row_type, label, ci) %>%
  pivot_wider(names_from = "by", values_from = "ci") %>%
  rename(Male_ci = Males, Female_ci = Females)


tbl %>%
  modify_table_body(
    left_join,
    myci,
    by = c("variable", "row_type", "label")
  ) %>%
  modify_table_header(
    Male_ci,
    hide = FALSE,
    label = "**95% CI Males**"
  ) %>%
  modify_table_header(
    Female_ci,
    hide = FALSE,
    label = "**95% CI Females**"
  )

Thanks greatly @Daniel you provided much need guidance promptly and clearly. — Moses, Mar 31 '21 at 16:16

How do you include confidence intervals for proportions in gtsummary with by?

2 Answers2

Linked