Create a table of summary statistics (with p.value) with sub-levels (long list)

Question

I am needing to conduct inferential analysis of a list of 21 countries comparing results (numeric variable) between gender. I have already created a pivot-long dataset with the following variables: Gender, Country, Results (numeric). I am using gtsummary::tbl_strata and gtsummary::tbl_summary. I could not create a nesting to run each country individually. Also, the output has been returning n(%) counts for the countries (table in wide format); calculating the result variable as overall. I have put the tabular structure I want below.

I could even generate individual tables and stack them. However, I would like a more rational strategy.

Code

library(tidyverse)
library(gtsummary)

# dataframe
df <- 
  data.frame(
    Country = c("Country 1", "Country 2", "Country 3", 
               "Country 1", "Country 2", "Country 3",
               "Country 1", "Country 2", "Country 3",
               "Country 1", "Country 2", "Country 3"),
    Gender = c("M", "M", "M",
                "W", "W", "W",
               "M", "M", "M",
               "W", "W", "W"), 
    Results = c(53, 67, 48,
          56, 58, 72, 
          78, 63, 67,
          54,49,62))
df

# Table
Table <- df %>%
  select(c('Gender',
           'Country',
           'Results')) %>%
  tbl_strata(
    strata = Country,
    .tbl_fun =
      ~.x %>%
  tbl_summary(by = Gender, 
              missing = "no") %>%
  bold_labels() %>%
  italicize_levels() %>%
  italicize_labels())
Table

score 1 · Accepted Answer · answered Mar 05 '21 at 16:07

1

Here's how you can get that table:

remotes::install_github("ddsjoberg/gtsummary")
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.3.7.9004'
library(tidyverse)

df <- 
  data.frame(
    Country = c("Country 1", "Country 2", "Country 3", 
                "Country 1", "Country 2", "Country 3",
                "Country 1", "Country 2", "Country 3",
                "Country 1", "Country 2", "Country 3"),
    Gender = c("M", "M", "M",
               "W", "W", "W",
               "M", "M", "M",
               "W", "W", "W"), 
    Results = c(53, 67, 48,
                56, 58, 72, 
                78, 63, 67,
                54,49,62))


theme_gtsummary_mean_sd()
tbl <-
  df %>%
  nest(data = -Country) %>%
  rowwise() %>%
  mutate(
    tbl = 
      data %>%
      tbl_summary(
        by = Gender,
        type = Results ~ "continuous",
        statistic = Results ~ "{mean} ± {sd}",
        label = list(Results = Country)
      ) %>%
      add_p() %>%
      modify_header(list(
        label ~ "**Country**",
        all_stat_cols() ~ "**{level}**"
      )) %>%
      list()
  ) %>%
  pull(tbl) %>%
  tbl_stack() %>%
  modify_spanning_header(all_stat_cols() ~ "**Gender**")

^{Created on 2021-03-05 by the reprex package (v1.0.0)}

answered Mar 05 '21 at 16:07

Daniel D. Sjoberg

8,820
2
12
28

1

Very good, Daniel. I have been bugging you a bit! However, every day I invest more energy in learning R and using gtsummary for my analyses. This is fantastic! – Cristiano Mar 05 '21 at 16:52
Hi Daniel, how are you? I wasn't able to work on the routine last week. Now that I'm picking it up again. I noticed that when I put in my real database, the country names are not being "copied". It returns a code number and an "Unknown" sub-line. I tried to create a new df object from tibble to data.frame format. However, it didn't work. Please, can you guide me? – Cristiano Mar 11 '21 at 15:02
you'll need to create _minimal_ reproducible example. – Daniel D. Sjoberg Mar 11 '21 at 15:29
It would be the same routine as above. However, in the real df I have 21 countries. In the example statement in my question, I only put three. It would be, maybe my problem in the actual analysis because I because I did a conversion from wide for long format. The output is tibble type. In the example I put and you processed, the df is already constituted as data.frame in the source. In my actual df I tried new object as.data.frame and process the routine you instructed. However, the error I pointed out in the previous comment occurs. Thanks – Cristiano Mar 11 '21 at 15:51
There must be more differences than you've outlined above. My first guess is that country is saved as a factor in your data and not character. Please update the example to be just like your data. – Daniel D. Sjoberg Mar 11 '21 at 16:13
Hi Daniel, your hypothesis has been confirmed. I have re-categorized Country from factor to character. The output (tbl) now shows all countries, but an "Unknown" sub-row is still generated below the result line for each one. I tested the routine you posted when the df is originated from as.data.frame or tibble objects. I also test variable selection (dplyr::select) with its own object generation prior to running the gtsummary::tbl_summary function in pipple. All of them worked. I can share a few lines from my original df. However, how can I upload it without having to type it to create it? – Cristiano Mar 11 '21 at 17:21
Ah, maybe one more useful information...in my real data the variable called "Results" (in the example above) is given as number (discrete); in my routine the other variable of interest is HA (same thing, but continuous numeric). It is generating this message: There was an error in 'add_p()/add_difference()' for variable 'HA', p-value omitted: Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels – Cristiano Mar 11 '21 at 17:54
1

add `missing = "no"` to the `tbl_summary()` call – Daniel D. Sjoberg Mar 11 '21 at 18:12
Hi Daniel. I need to modify the test used: Welch to Wilcoxon (paired). I entered this command: add_p(test = Results ~ stats::wilcox.test(paired = TRUE)), but it returns the error: Error: Problem with `mutate()` input `tbl`. x argument `x` is missing, no pattern i Input `tbl` is ``%>%`(...)`. i The error occurred in row 1. – Cristiano Mar 18 '21 at 13:54
Have you reviewed the documentation online? That's not how to specify the test – Daniel D. Sjoberg Mar 18 '21 at 17:10
http://www.danieldsjoberg.com/gtsummary/reference/tests.html – Daniel D. Sjoberg Mar 18 '21 at 17:21
Hi Daniel. I have read the details you suggested. I did it by customizing a function. IIf it is possible by already built arguments from gtsummary, let me know how this would be put inline. I tested "paired.wilcox.test", but I did not succeed! – Cristiano Mar 19 '21 at 12:10
1

There's an example in the table gallery vignette – Daniel D. Sjoberg Mar 20 '21 at 00:58
1

Thanks Daniel. For me, everything answered was satisfactory. It is reproducible! – Cristiano Mar 23 '21 at 15:08

Create a table of summary statistics (with p.value) with sub-levels (long list)

1 Answers1