Best way to report multiple regression models on several dimensions (evolving model formulation and year of data)

Question

Situation

I am fitting a series of evolving regression models. For the purposes of this question, we can think of these models in terms of Model A, Model B, and Model C. All models share at least one same covariate.

I am also fitting these models for two separate years of data. Again, for the purposes of this question, the years will be 2000 and 2010.

In an attempt to simplify the reporting of results, I am attempting to combine the reporting of the regressions into a single table that would have some kind of the following format:


                     2000        2010
Model A

    Coef Ex1
 
Model B

    Coef Ex1

    Coef Ex2

Model C

    Coef Ex1

    Coef Ex2

    Coef Ex3

The idea being that someone can look quickly at Coef Ex1 across several models and years.

What Have I Tried

I have tried to achieve the above table using both R stargazer and kable packages. With stargazer I can get the fully formatted table for a single model formulation across many years (e.g., stargazer(modelA2000, modelA2010), but I cannot figure out how to stack additional model formulations on the rows.

For kable I have been able to stack horizontal models, but I have not been able to add in additional years (e.g., coefs <- bind_rows(tidy(modelA2000), tidy(modelB2000), tidy(modelC2000)); coefs %>% kable()).

Question: how can I use stargazer or kable to report evolving regression models (which share the same covariates) in the rows but also with year of cross section on the column? I think I can somehow extend the answer posted here, although I'm not sure how.

Reproducible example


# Load the data
mtcars <- mtcars

# Create example results for models A, B, and C for 2000
modelA2000 <- lm(mpg ~ cyl, data = mtcars)
modelB2000 <- lm(mpg ~ cyl + wt, data = mtcars)
modelC2000 <- lm(mpg ~ cyl + wt + disp, data = mtcars)

# Slightly modify data for second set of results
mtcars$cyl <- mtcars$cyl*runif(1)

# Fit second set of results. Same models, pretending it's a different year. 
modelA2010 <- lm(mpg ~ cyl, data = mtcars)
modelB2010 <- lm(mpg ~ cyl + wt, data = mtcars)
modelC2010 <- lm(mpg ~ cyl + wt + disp, data = mtcars)

Vincent · Accepted Answer · 2021-09-29T16:38:32.847

Two notes before starting:

You want a pretty "custom" table, so it is almost inevitable that some manual operations will be required.
My answer relies on the development version of modelsummary, which you can install like this:

library(remotes)
install_github("vincentarelbundock/modelsummary")

We will need 4 concepts, many of them related to the broom package:

broom::tidy a function that takes a statistical model and returns a data.frame of estimates with one row per coefficient.
broom::glance a function that takes a statistical model and returns a one-row data.frame with model characteristics (e.g., number of observations)
modelsummary_list a list with 2 elements called "tidy" and "glance", and with a class name of "modelsummary_list".

The modelsummary package allows you to draw regression tables. Under the hood, it uses broom::tidy and broom::glance to extract information from those models. Users can also supply their own information about a model by supplying a list to which we assign the class modelsummary_list, as documented here.

EDIT: The recommended way to do this in modelsummary is now to use the group argument. Scroll to the end of this post for illustrative code.

Obsolete example with useful discussion

The modelsummary_wide is a function that was initially designed to "stack" results from several models with several groups of coefficients. This is useful for things like multinomial models, but it also helps us in your case, where you have multiple models in multiple groups (here: years).

First, we load packages, tweak the data, and estimate our models:

library(modelsummary)
library(broom)
library(dplyr)

mtcars2010 <- mtcars
mtcars2010$cyl <- mtcars$cyl * runif(1)

models <- list(
  "A" = list(
    lm(mpg ~ cyl, data = mtcars),
    lm(mpg ~ cyl, data = mtcars2010)),
  "B" = list(
    lm(mpg ~ cyl + wt, data = mtcars),
    lm(mpg ~ cyl + wt, data = mtcars2010)),
  "C" = list(
    lm(mpg ~ cyl + wt + disp, data = mtcars),
    lm(mpg ~ cyl + wt + disp, data = mtcars2010)))

Notice that we saved our models in three groups, in a list of list.

Then, we define a tidy_model function that accepts a list of two models (one per year), combines the information on those two models, and creates a modelsummary_list object (again, please refer to the documentation). Note that we assign the "year" information to a "group" column in the tidy object.

We apply this function to each of our three groups of models using lapply.

tidy_model <- function(model_list) {
    # tidy estimates
    tidy_2000 <- broom::tidy(model_list[[1]])
    tidy_2010 <- broom::tidy(model_list[[2]])
    # create a "group" column
    tidy_2000$group <- 2000
    tidy_2010$group <- 2010
    ti <- bind_rows(tidy_2000, tidy_2010)
    # glance estimates
    gl <- data.frame("N" = stats::nobs(model_list[[1]]))
    # output
    out <- list(tidy = ti, glance = gl)
    class(out) <- "modelsummary_list"
    return(out)
}

models <- lapply(models, tidy_model)

Finally, we call the modelsummary_wide with the stacking="vertical" argument to obtain this table:

modelsummary_wide(models, stacking = "vertical")

Of course, the table can be adjusted, coefficients renamed, etc. using the other arguments of the modelsummary_wide function or with kableExtra or some other package supported by the output argument.

More modern example without detailed explanation

library("modelsummary")
library("broom")
library("quantreg")


mtcars2010 <- mtcars
mtcars2010$cyl <- mtcars$cyl * runif(1)

models <- list(
  "A" = list(
    "2000" = rq(mpg ~ cyl, data = mtcars),
    "2010" = rq(mpg ~ cyl, data = mtcars2010)),
  "B" = list(
    "2000" = rq(mpg ~ cyl + wt, data = mtcars),
    "2010" = rq(mpg ~ cyl + wt, data = mtcars2010)),
  "C" = list(
    "2000" = rq(mpg ~ cyl + wt + disp, data = mtcars),
    "2010" = rq(mpg ~ cyl + wt + disp, data = mtcars2010)))

tidy_model <- function(model_list) {
    # tidy estimates
    tidy_2000 <- broom::tidy(model_list[[1]])
    tidy_2010 <- broom::tidy(model_list[[2]])
    # create a "group" column
    tidy_2000$group <- "2000"
    tidy_2010$group <- "2010"
    ti <- bind_rows(tidy_2000, tidy_2010)
    # output
    out <- list(tidy = ti, glance = data.frame("nobs 2010" = length(model_list[[1]]$fitted.values)))
    class(out) <- "modelsummary_list"
    return(out)
}

models <- lapply(models, tidy_model)

modelsummary(models, 
             group = model + term ~ group, 
             statistic = "conf.int")

		2000	2010
A	(Intercept)	36.800	36.800
		[30.034, 42.403]	[30.034, 42.403]
	cyl	-2.700	-67.944
		[-3.465, -1.792]	[-87.204, -45.102]
B	(Intercept)	38.871	38.871
		[30.972, 42.896]	[30.972, 42.896]
	cyl	-1.743	-43.858
		[-2.154, -0.535]	[-54.215, -13.472]
	wt	-2.679	-2.679
		[-5.313, -1.531]	[-5.313, -1.531]
C	(Intercept)	40.683	40.683
		[31.235, 47.507]	[31.235, 47.507]
	cyl	-1.993	-50.162
		[-3.137, -1.322]	[-78.948, -33.258]
	wt	-2.937	-2.937
		[-5.443, -1.362]	[-5.443, -1.362]
	disp	0.003	0.003
		[-0.009, 0.035]	[-0.009, 0.035]

That's it! Ah, of course - `modelsummary`! I should have remembered. Thanks so much. :) — whatwhatWHAT, Mar 23 '21 at 14:27
is there any reason this solution wouldn't work with the option `exponentiate=T`? I've added a few lines to manually exponentiate the coefs, but not sure how to get it to play nicely with exponentiate confidence intervals rendered by `exp(confint(examplemodel))` — whatwhatWHAT, Mar 26 '21 at 21:08
I think I've got a jenky hotfix going...manually exponentiating each sub model to get confidence intervals, and inserting that into the tidy estimates as conf.low and conf.high. This lets me pass them as part of the `estimate` argument of `modelsummary_wide()` — whatwhatWHAT, Mar 26 '21 at 21:15
Normally, `modelsummary` handles this for you by feeding the `exponentiate` argument to `broom::tidy` internally. But remember that, in this case, we had to hack our way to the table by calling `broom::tidy` ourselves in the `tidy_model` function. The data comes to us pre-formatted, and `modelsummary` doesn't attempt to do anything to it. You need to add `exponentiate=TRUE` to the `broom` calls in `tidy_model`. — Vincent, Mar 27 '21 at 11:41
sorry to necro this post - I have been telling everyone about modelsummary and people are enjoying it! However, this solution no longer works with the 0.9.0+ version of modelsummary, correct? I am getting an error that tidy does not accept the `qr` model object. I will keep playing around with the `group` parameter! — whatwhatWHAT, Sep 29 '21 at 15:28
You're right. The recommeded way to do this now is to use the `group` argument. I put an example in an edit. — Vincent, Sep 29 '21 at 16:38

Best way to report multiple regression models on several dimensions (evolving model formulation and year of data)

1 Answers1

Obsolete example with useful discussion

More modern example without detailed explanation