1

I'm trying to produce a table of multiple models with standardized coefficients color-coded and sized based on the coefficient size. I'll be doing this with dozens of models and it seems using color and size would be a way to show patterns across predictors and models. Something like this from {corrplot} is what I'm interested in, but I'd only need one color: example

I'm also interested in having cells for p-values > .05 to be blank or made very faint or something.

Here is an example I'm working with.

library(dplyr)
library(modelsummary)

dat <- mtcars %>% mutate(
  cyl = as.factor(cyl),
  gear = as.factor(gear)
)

## List of models
models = list(
  "MPG" = lm(mpg ~ cyl + hp + wt + qsec + vs + am + gear + carb, data = dat),
  "Disp" = lm(disp ~ cyl + hp + wt + qsec + vs + am + gear + carb, data = dat),
  "Drat" = lm(drat ~ cyl + hp + wt + qsec + vs + am + gear + carb, data = dat)
    )

## I feed the list of models to modelsummary() and ask for only coefficients and p.value as side by side columns
models.ep <-  modelsummary(models,
             standardize = "basic", 
             shape = term ~ model + statistic,
             estimate = "{estimate}",
             statistic = "p.value" ,
             gof_map = NA,
             output = "data.frame")

## I started trying to use {formattable} but the color bars aren't what I want (I'm interested in the image shown above with the size and darkness/lightness of the circles representing effect magnitude.  
library(formattable)
formattable(models.ep, list(
"MPG / Est." = color_bar("#e9c46a"),
"Disp / Est." = color_bar("#80ed99"),
"Drat / Est." = color_bar("#f28482")
                             ))

## I also looked around in the {flextable} but did not see a way to do what I need.
tci
  • 69
  • 7

1 Answers1

2

You may be able to achieve something similar to this with the get_estimates() function from modelsummary and the ggplot2 package. This is not exactly the image you gave, but it may help you get started:

library(ggplot2)
library(modelsummary)

dat <- mtcars |> transform(
  cyl = as.factor(cyl),
  gear = as.factor(gear)
)

models = list(
  "MPG" = lm(mpg ~ cyl + hp + wt + qsec + vs + am + gear + carb, data = dat),
  "Disp" = lm(disp ~ cyl + hp + wt + qsec + vs + am + gear + carb, data = dat),
  "Drat" = lm(drat ~ cyl + hp + wt + qsec + vs + am + gear + carb, data = dat)
    )

results <- lapply(models, get_estimates)
results <- lapply(names(results), \(n) transform(results[[n]], model = n))
results <- do.call("rbind", results)

ggplot(results, aes(x = model, y = term, size = estimate, color = p.value)) +
    geom_point() +
    theme_minimal() +
    theme(panel.grid = element_blank()) +
    labs(x = "", y = "")

Vincent
  • 15,809
  • 7
  • 37
  • 39
  • Thanks, @vincent, this is great! Is there an easy way to have the model outcome names appear at the top of each column like they do in modelsummary(): "MPG", "Disp", "Drat"? – tci Oct 09 '22 at 19:43
  • 1
    These will all be `ggplot2` options, with many questions/answers on SO and in the docs. For instance: https://stackoverflow.com/questions/26838005/putting-x-axis-at-top-of-ggplot2-chart – Vincent Oct 09 '22 at 20:00
  • 1
    Thanks for pointing me in the right direction! This did it, and is so simple!: `+ scale_x_discrete(position = "top")` – tci Oct 09 '22 at 21:18
  • This step is taking really long every time I run it on larger datasets (20-30 minutes): `results <- lapply(models, get_estimates)` Is there any way to do this step faster? – tci Oct 13 '22 at 13:38
  • I think if you update the `parameters` package (a dependency) and `modelsummary` to their development version, you will likely get a 10-20x speed up. See here: https://github.com/vincentarelbundock/modelsummary/issues/562#issuecomment-1274891982 – Vincent Oct 14 '22 at 14:54