1

I am trying to create a table with descriptive statistics for the whole sample as well as subgroups. My goal is to use the wonderful modelsummary R package to return one table with mean, sd, min, median, max, and graphs for the variables calculated for the entire sample as well as mean and sd for every group. I was able to achieve this with two separate tables. However, I would like to have all this information in a single table with the statistics about the entire sample first (see Fig. 1) and the subgroups second (see Fig. 2). If possible, I would also like to add the first-level heading for the whole sample and name it "All" or "Entire sample." Lastly, given that journals in my field require the use of the APA style, I wonder if the table can be turned into this format (e.g., with all the required borders, text in black instead of grey, etc.) (see Fig. 3). If modelsummary does not handle this, I am also open to trying other packages. Thanks so much to anyone who will help!

library(palmerpenguins)
library(tidyverse)
library(kableExtra)
library(modelsummary)

penguins <- penguins %>% as.data.frame() %>% select(species, bill_length_mm, bill_depth_mm,  flipper_length_mm, body_mass_g)

#scale variables for histogram and boxplot
pen_scaled <- penguins %>% select(bill_length_mm, bill_depth_mm,  flipper_length_mm, body_mass_g) %>% 
  mutate(across(where(is.numeric), ~scale(.))) %>% as.data.frame()

# create a list with individual variables and remove missing
pen_list <- lapply(pen_scaled, na.omit)

# create a table with `datasummary`
# add a histogram with column_spec and spec_hist
# add a boxplot with colun_spec and spec_box
emptycol <- function(x) " "
pen_table <- datasummary(All(penguins) ~ Mean + SD + Min + Median + Max + Heading("Boxplot") * emptycol + Heading("Histogram") * emptycol, data = penguins) %>%
    column_spec(column = 7, image = spec_boxplot(pen_list)) %>%
    column_spec(column = 8, image = spec_hist(pen_list))

pen_table

Figure 1

Figure 1

pen_table2 <- datasummary_balance(~species, data = penguins, dinm = FALSE)

pen_table2

Figure 2

Figure 2

Figure 3

Figure 3

Michael Matta
  • 394
  • 2
  • 16

1 Answers1

3

There are two questions here:

  1. How to change the appearance of the table?
  2. How to create a table with a given shape (yet to be specifically defined)?

Question 1

The appearance is extremely configurable using the kableExtra package (modelsummary also supports gt, huxtable, and flextable through the output argument). An easy way to change the look is to use the kable_classic() function from kableExtra, as illustrated below. If you have more specific needs, please refer to the kableExtra documentation:

Question 2

As noted in the documentation for datasummary(), you can use a 1 to indicate the "full sample". Here is a minimal example:

library(palmerpenguins)
library(tidyverse)
library(kableExtra)
library(modelsummary)

penguins <- penguins %>% as.data.frame() %>% select(species, bill_length_mm, bill_depth_mm,  flipper_length_mm, body_mass_g)

# scale variables for histogram and boxplot
pen_scaled <- penguins %>% select(bill_length_mm, bill_depth_mm,  flipper_length_mm, body_mass_g) %>% 
  mutate(across(where(is.numeric), ~scale(.))) %>% as.data.frame()
pen_list <- lapply(pen_scaled, na.omit)

emptycol <- function(x) " "
datasummary(All(penguins) ~ Heading("Entire sample") * 1 * (Mean + SD + Min + Median + Max + Heading("Boxplot") * emptycol + Heading("Histogram") * emptycol) + species * (Mean + SD),
            data = penguins) %>%
    column_spec(column = 7, image = spec_boxplot(pen_list)) %>%
    column_spec(column = 8, image = spec_hist(pen_list)) %>%
    kable_classic()

enter image description here

Vincent
  • 15,809
  • 7
  • 37
  • 39
  • 1
    Thank so much for your answer, Vincent! I apologise for not being unclear. I updated Figure 3 (created in Microsoft Word) with my desired outcome. – Michael Matta Mar 11 '22 at 15:07
  • 1
    Sure. We just need to be a bit creative with the formula. See my edited answer. – Vincent Mar 11 '22 at 15:42
  • Hi Vincent, quick follow up on this answer. Is there a way to remove one or more groups from the right side of the table? In my data set, I have racial and ethnic groups instead of species of penguins. However, one of the groups is "Biracial or multiracial" and for the purpose of my work it does not make sense to show the demographics of that group. Is there a way to include those individuals in the left side of the table (where I estimate the statistics for the whole sample) and then remove it from the right side? – Michael Matta Apr 19 '22 at 21:37
  • 1
    Not 100% sure I understand, but perhaps you could create two different variables and use one of the left and the other on the right. You could set values to `NA` in one of the variables to omit whatever category you don't care about. That feels like a hack, obviously, but it seems like this might be the most straightforward approach. – Vincent Apr 19 '22 at 22:55
  • 1
    Thank you so much, Vincent. Your hack worked great. For future reference, here is the code to recreate this situation in your example above. I created a new variable named "species_2" without the Adelie category in the penguins dataset using `mutate(species_2 = na_if(species, "Adelie") %>% droplevels())` and then use this new variable in the `datasummary` function. – Michael Matta Apr 19 '22 at 23:28
  • Hi Vincent! I have another quick follow up on this question. Is there a way to remove the thin grey lines from the table? You can see my desired outcome here: https://haozhu233.github.io/kableExtra/awesome_table_in_html.html#Alternative_themes (second example). When I try to add `kbl() %>% kable_classic(html_font = "Times new roman")`, I get an error that says `Error in as.data.frame.default(x) : cannot coerce class ‘c("kableExtra", "knitr_kable")’ to a data.frame` I understand why this is an issue but I don't know how to solve it. – Michael Matta May 02 '22 at 21:33
  • 1
    By default `datasummary()` applies `kable_styling()` to all tables. It looks like `kable_classic()` does not undo prior styling, and only makes its own changes on top of them. One idea would be to use a global option to "hack" the [modelsummary theming system.](https://vincentarelbundock.github.io/modelsummary/articles/appearance.html#themes) Calling this will output raw `kableExtra`: `options("modelsummary_theme_kableExtra" = function(x, ...) return(x))` – Vincent May 02 '22 at 23:57