1

I am doing multivariable regression on a list of outcome variables with a consistent set of independent variables. For univariable regression, I have followed this example to use tl_uvregression from gtsummary on a nested data frame, but I am trying to generalize this to multivariable regression using tbl_regression on a nested data frame, and when I try to unnest the tables, I get the error that "the input must be a list of vectors." Below is what I have tried - I assume there's some small but critical step that I'm missing, but I can't figure out what it is. My desired output is a table of multivariable regression output, with each model as a column and all the covariates as rows (similar to performing tbl_merge on a list of each of these multivariable models run separately in tbl_regression).

library(tidyverse)
library(magrittr)
library(gtsummary)
library(broom)

id <- 1:2000
gender <- sample(0:1, 2000, replace = T)
age <- sample(17:64, 2000, replace = T)
race <- sample(0:1, 2000, replace = T)
health_score <- sample(0:25, 2000, replace = T)
cond_a <- sample(0:1, 2000, replace = T)
cond_b <- sample(0:1, 2000, replace = T)
cond_c <- sample(0:1, 2000, replace = T)
cond_d <- sample(0:1, 2000, replace = T)
df <- data.frame(id, gender, age, race, health_score, cond_a, cond_b, cond_c, cond_d)

regression_tables <- df %>% select(-id) %>% 
  gather(c(cond_a, cond_b, cond_c, cond_d), key = "condition", value = "case") %>% 
  group_by(condition) %>% nest() %>% 
  mutate(model = map(data, ~glm(case ~ gender + age + race + health_score, family = "binomial", data = .)),
         table = map(model, ~tbl_regression, exponentiate = T, conf.level = 0.99)) %>% 
  select(table) %>% unnest(table)
Kellan Baker
  • 375
  • 3
  • 11
  • So why not stop at the `select` step and examine the structure of the result? – IRTFM Feb 20 '21 at 18:17
  • @IRTFM I did. And I don't know what to do with the fact that `table` is a list of functions. – Kellan Baker Feb 20 '21 at 18:35
  • 1
    Try replacing the last select/nest with `pull(table) %>% tbl_stack()`. Use the group header argument in the stack function to get headers above each of the multivariable model. – Daniel D. Sjoberg Feb 20 '21 at 18:49
  • @DanielD.Sjoberg thank you so much! I was trying `pull` but on the model variable. Also, @akrun noticed the extra `~` in the call to `map(model, ~tbl_regression)` that was turning the output into an unusable table of functions. – Kellan Baker Feb 20 '21 at 21:24

1 Answers1

2

The issue seems to be the use of lambda expression (~) and without making use of it i.e specifying the arguments. Also, there are no tidy methods available (from broom) to extract into a tibble format from tbl_regression

library(dplyr)
library(tidyr)
library(broom)
library(gtsummary)

out <-  df %>% 
   select(-id) %>% 
 gather(c(cond_a, cond_b, cond_c, cond_d), key = "condition", 
    value = "case") %>% 
 group_by(condition) %>% 
 nest() %>% 
  mutate(model = map(data,  
    ~glm(case ~ gender + age + race + health_score, 
        family = "binomial", data = .)),
  table = map(model, tbl_regression, exponentiate = T, conf.level = 0.99)) %>%
  select(table)

out$table[[1]]

enter image description here


In addition to the OP's method of using map to loop over, in fact, we could simply apply the model, tbl_regression after nest_by (replaced the gather with pivot_longer as gather would get deprecated, pivot_longer is a generalized version)

out <-  df %>%
    select(-id) %>% 
    pivot_longer(cols = starts_with('cond'), 
       names_to = 'condition', values_to = 'case') %>% 
    nest_by(condition) %>% 
    mutate(model = list(glm(case ~ gender + age + 
      race + health_score, 
        family = "binomial", data = data)), 
table_out = list(tbl_regression(model, exponentiate = TRUE, conf.level = 0.99)))

out
# A tibble: 4 x 4
# Rowwise:  condition
#  condition               data model  table_out 
#  <chr>     <list<tbl_df[,5]>> <list> <list>    
#1 cond_a           [2,000 × 5] <glm>  <tbl_rgrs>
#2 cond_b           [2,000 × 5] <glm>  <tbl_rgrs>
#3 cond_c           [2,000 × 5] <glm>  <tbl_rgrs>
#4 cond_d           [2,000 × 5] <glm>  <tbl_rgrs>

If we need a merged table, apply the tbl_merge on the list of tbl_regression

tbl_merge(out$table_out)

-output

enter image description here

akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you for this! I have to admit to being confused by the second method just using `list` to run through different models, but I appreciate that it moves me closer to a solution. With the resulting tibble, is there a way to turn that into a merged table using, e.g., `gt_summary::tbl_merge`, without manually extracting each individual table from out$table_out? I have been trying `lapply` without success. – Kellan Baker Feb 20 '21 at 19:02
  • @KellanBaker you can just do `tbl_merge(out$table_out)` – akrun Feb 20 '21 at 19:09
  • @KellanBaker Can you please check the update. The reason it didn't work was that `tbl_merge` expects a `list` as input – akrun Feb 20 '21 at 19:28
  • 1
    Sorry for the delay, the server in my secure environment is down, which is preventing me from logging back in. As soon as I can get in I will check on this. I really appreciate your help! – Kellan Baker Feb 20 '21 at 21:14
  • This worked, thank you! I finally understand what you mean about the unused `~` that turned the output into a function. Can you point me toward a resource that explains how/why I can just use `list` to cycle through the models in the nested data frame? I have been looking for an explanation of this but don't understand what `list` is doing here as a replacement for `map`. – Kellan Baker Feb 20 '21 at 21:28
  • 1
    @KellanBaker the reason I used `list` is that we are creating a new column in `mutate` and the `model` output is a complex data structure in a list format. So, I wrapped the `list` to containerize the output as a single unit as `mutate` by default will be looking for a simple vector column without any frills. By using `map`, you are essentially looping over each element or row and then the default structure of output is a list. Also, as we use `nest_by`, the grouping is rowwise and thus can build the model directly on each row – akrun Feb 20 '21 at 21:29