2

I have a list of 14 dependent variables for which I am running identical regression models (same model type, same independent variables). I'm using gather to get all of the dependent variables as a single outcome variable, running tbl_uvregression or tbl_regression on that variable, and then using tbl_stack from the gtsummary package to organize the output. I am trying to figure out how to name each table using the value of the outcome variable for each model. I understand that I can pass a list of names to tbl_stack(group_header), but I have noticed that's error-prone because I have to be careful of how the values in the outcome variable are arranged and then make sure I type them in the same order, and I've made enough mistakes that I'm worried about that approach. Is there a way to source the group_header arguments directly from the values of the dependent variable? The outcome variables are named, but of course that isn't preserved when I gather them to run the models.

library(tidyverse)
library(magrittr)
library(gtsummary)
library(broom)

id <- 1:2000
gender <- sample(0:1, 2000, replace = T)
age <- sample(17:64, 2000, replace = T)
race <- sample(0:1, 2000, replace = T)
health_score <- sample(0:25, 2000, replace = T)
cond_a <- sample(0:1, 2000, replace = T)
cond_b <- sample(0:1, 2000, replace = T)
cond_c <- sample(0:1, 2000, replace = T)
cond_d <- sample(0:1, 2000, replace = T)
df <- data.frame(id, gender, age, race, health_score, cond_a, cond_b, cond_c, cond_d)

regression_tables <- df %>% select(-id) %>% 
  gather(c(cond_a, cond_b, cond_c, cond_d), key = "outcome", value = "case") %>% 
  group_by(outcome) %>% nest() %>% 
  mutate(model = map(data, ~glm(case ~ gender + age + race + health_score, family = "binomial", data = .)), 
table = map(model, tbl_regression, exponentiate = T, conf.level = 0.99)) %>% 
pull(table) %>% tbl_stack(**[model names to become table headers]**)

In this example, I would like stacked tables where the header for each table is "Condition A", "Condition B", "Condition C", "Condition D" (the values of the gathered outcome variable). The two column headings ("Adults" and "Children" in the example screenshot below) will come from running the models separately for adults and children, stacking them as described above, and then using tbl_merge. Example of final output

Kellan Baker
  • 375
  • 3
  • 11

1 Answers1

1

I cannot run the code in the post, this table = map(model, ~ .. part throws some weird output.

If you look at the tibble you have before the pull, using the code below:

regression_tables <- df %>% select(-id) %>% 
gather(c(cond_a, cond_b, cond_c, cond_d), key = "outcome", value = "case") %>% 
group_by(outcome) %>% nest() %>% 
mutate(model = map(data, ~glm(case ~ gender + age + race + health_score, family = "binomial", data = .)), 
table = map(model,tbl_regression, exponentiate = T, conf.level = 0.99))

You see that there is a corresponding column outcome by which your results are nested:

# A tibble: 4 x 4
# Groups:   outcome [4]
  outcome data                 model  table     
  <chr>   <list>               <list> <list>    
1 cond_a  <tibble [2,000 × 5]> <glm>  <tbl_rgrs>
2 cond_b  <tibble [2,000 × 5]> <glm>  <tbl_rgrs>
3 cond_c  <tibble [2,000 × 5]> <glm>  <tbl_rgrs>
4 cond_d  <tibble [2,000 × 5]> <glm>  <tbl_rgrs>

We can just stack it like this:

tbl_stack(regression_tables$table,group_header=regression_tables$outcome)

enter image description here

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • 1
    This looks like the perfect solution to me as well. I would also suggestion that you first merge the cond_a models for adults and children, then for cond_b, etc. Lastly, stack them. Otherwise you can run into merging issues because the same variable names are repeated in the models. – Daniel D. Sjoberg Mar 07 '21 at 18:21
  • Apologies for the extraneous `~` in the example - edited to remove. When I initially tried your solution, I got the error "Expecting tbls to be a list," and it took me a while to figure out that I needed to not include `pull`. I don't understand why this is, as without `pull` it's a tibble, not a list. Why do I not need `pull` here? – Kellan Baker Mar 07 '21 at 18:55
  • I am simply calling out the column, it's like if you do ```tbl_stack(regression_tables %>% pull(table) , group_header= regression_tables %>% pull(outcome))``` – StupidWolf Mar 07 '21 at 19:09
  • 1
    In this case, once you execute pull(), you lose the other column that tells you the outcome, like ```regression_tables %>% pull(table) %>% tbl_stack(group_header=???)``` you can see it's had to put the outcome back in – StupidWolf Mar 07 '21 at 19:12
  • @StupidWolf it makes sense that `pull` loses the outcome column. It's the behavior of `tbl_stack` that's confusing to me. It asks for a list, but for this approach to work, it accepts only a tibble. I guess it has to do with the structure of the tibble, with lists in columns? – Kellan Baker Mar 07 '21 at 19:24
  • @DanielD.Sjoberg thank you for the heads up. I am indeed having trouble with this and cannot figure out how to operationalize your suggested approach, since all the condition models are run together for adults, then separately for children. There is an additional wrinkle in that the list of models for children has 1 less model for children because `tbl_regression` threw an error over the bad fit ("need at least two non-NA values to interpolate"). Should I ask this question in a separate post? – Kellan Baker Mar 07 '21 at 19:28
  • 1
    I added the merging code here: https://gist.github.com/ddsjoberg/0389dc197db00c1d7f0aee486b2c575d – Daniel D. Sjoberg Mar 07 '21 at 19:37
  • @StupidWolf Do you know how your solution would apply to `tbl_merge`? Those headers seem to need to be added earlier, at the individual table stage (i.e., within the `mutate` step), and I can't get the syntax right to use the `outcome` variable as a header for each table as I go before using `tbl_merge` on them. – Kellan Baker Mar 07 '21 at 23:17