0

I want to export a PDF file via R Markdown of the following table with numeric and factor variables using datasummary as per the below:

---
title: "R Notebook"
output:
  html_document:
    df_print: paged
  html_notebook: default
  pdf_document: default
---

Table 1 example:

```{r, warning=FALSE, message=FALSE, echo=FALSE}
library(tidyverse)
library(modelsummary)
library(kableExtra)

tmp <- mtcars[, c("mpg", "hp")]

tmp$class <- 0
tmp$class[15:32] <- 1
tmp$class <- as.factor(tmp$class)

tmp$region <- "A"
tmp$region[15:20] <- "B"
tmp$region[21:32] <- "C"
tmp$region <- as.factor(tmp$region)

## change position of varianbles
tmp <- tmp[,c("mpg","class","region","hp")]

# create a list with individual variables
# remove missing and rescale
tmp_scaled <- tmp
tmp_scaled$mpg <- scale(tmp_scaled$mpg)
tmp_scaled$hp <- scale(tmp_scaled$hp)

tmp_scaled_list <- lapply(tmp_scaled, na.omit)

tmp_scaled_list[2] <- list(NULL)
tmp_scaled_list[3] <- list(NULL)

N_alt <- function(x) paste0(N(x), ' (', round((as.numeric(N(x))/32)*100,digits=1), ')')

# create a table with `datasummary`
emptycol = function(x) " "
datasummary(mpg + class + region + hp ~ Heading("N (%)") * N_alt + Mean + SD + Heading("Boxplot") * emptycol + Heading("Histogram") * emptycol, data = tmp) %>%
  column_spec(column = 6, image = spec_boxplot(tmp_scaled_list[c(1,4)])) %>%
  column_spec(column = 7, image = spec_hist(tmp_scaled_list[c(1,4)]))

```

This is the current output I see when I knit to HTML: enter image description here

I am facing 3 issues right now:

1-If I try to knit to PDF I get the following error message:

! Package siunitx Error: Invalid token 'N' in numerical input.

Error: LaTeX failed to compile test_table1.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See test_table1.log for more info.

Any ideas about what this might be?

2-The boxplot and histograms are incorrect. They are repeated because there are only 2 numeric variables. How can I make sure the correct boxplot and histogram is displayed for each numeric variables and nothing is displayed for factor variables?

3-Do you know how I could move factor variables under numeric variables and create a header for 'Category' to include the levels of factor variables such like:

         Category   N(%)   Mean   SD   Boxplot   Histogram
mpg
class    0
         1
region   A
         B
         C
hp

Thanks very much!

-- Edit:

Regarding issue number 3, I am just missing 1 point. My code is:

library(modelsummary)
library(kableExtra)

tmp <- mtcars[, c("mpg", "hp")]

tmp$class <- 0
tmp$class[15:32] <- 1
tmp$class <- as.factor(tmp$class)

tmp$region <- 1
tmp$region[15:20] <- 2
tmp$region[21:32] <- 3
tmp$region <- as.factor(tmp$region)

tmp$class <- 0
tmp$region <- 0

## change position of varianbles
tmp <- tmp[,c("mpg","class","region","hp")]

# create a list with individual variables
# remove missing and rescale
tmp_scaled <- tmp
tmp_scaled$mpg <- scale(tmp_scaled$mpg)
tmp_scaled$hp <- scale(tmp_scaled$hp)

tmp_scaled_list <- lapply(tmp_scaled, na.omit)

tmp_scaled_list[2] <- list(NULL)
tmp_scaled_list[3] <- list(NULL)

N_alt = function(x) {
  if (x %in% c(tmp$class)) {
    paste0('[14 (43.8); 18 (56.3)]') 
  } else if (x %in% c(tmp$region)) {
    paste0('[14 (43.8); 6 (18.8); 12 (37.5)]')  
  } else {
    paste0('[32 (100)]')
  }
}

Mean_alt = function(x) {
  if (x %in% c(tmp$class, tmp$region)) {
    paste0("")
  } else {
    mean(x)  
  }
}

# create a table with `datasummary`
emptycol = function(x) " "
datasummary(mpg + (`class [0,1]`= class) + (`region [A,B,C]`= region) + hp ~ Heading("N (%)") * N_alt + Heading("Mean") * Mean_alt + Heading("Boxplot") * emptycol + Heading("Histogram") * emptycol, data = tmp) %>%
  column_spec(column = 4, image = spec_boxplot(tmp_scaled_list)) %>%
  column_spec(column = 5, image = spec_hist(tmp_scaled_list))

which gives me: enter image description here

My N_alt function does not work properly. Does anyone know what I am missing here?

  • The LaTeX error is probably raised because you need to escape (or double escape) the `%` sign. I haven't tried it, but you might consider including empty elements in the list if you don't want the plots to be repeated. Combining factor and numeric variables is tricky. The formula syntax of `datasummary` function relies on the `tables` package. I highly recommend that package's vignette if you want to do a deep dive and learn how to create complex tables: https://cran.r-project.org/web/packages/tables/vignettes/tables.pdf – Vincent Nov 01 '21 at 00:43
  • 1
    Thanks Vincent. 1)What do I have to do to "escape (or double escape) the % sign? Not sure I understand, sorry. 2) I have included tmp_scaled_list[2] <- list(NULL) and tmp_scaled_list[3] <- list(NULL) (included in the code above) to empty those fields but the output is the same, any ideas? 3) I am currently investigating, thanks for sharing that vignette. – Daniela Rodrigues Nov 01 '21 at 11:44
  • 1) pre-pend the % with one or two backslashes. 2) Don't know. 3) Good luck! – Vincent Nov 01 '21 at 12:38
  • For the record, #2 is a `kableExtra` issue. – Vincent Nov 01 '21 at 12:39
  • Thanks Vincent. Can I just confirm please. With 1) you mean \%>% or \\%>%? – Daniela Rodrigues Nov 01 '21 at 14:48
  • No, I meant the % sign in your column header. – Vincent Nov 01 '21 at 16:00
  • I see, unfortunately it didn't work. I even replaced Heading(N (%)") with Heading("Number\\(Percent)") or even Heading("text")and still get the same error. – Daniela Rodrigues Nov 01 '21 at 17:20
  • Can you install from Github, restart R and try again? `remotes::install_github("vincentarelbundock/modelsummary")` – Vincent Nov 01 '21 at 18:16
  • 1
    It worked!! Thanks so much!! Only issue nr 3 left:) I found a very archaic way to solve nr 2. – Daniela Rodrigues Nov 01 '21 at 18:29
  • In my Edit section re issue nr 3, I am almost done. Only my N_alt function is not working properly at the moment. Do you know why? Thanks in advance! – Daniela Rodrigues Nov 06 '21 at 11:50

1 Answers1

1

Code that solves issue number 2:

---
title: "R Notebook"
output:
  html_document:
    df_print: paged
  html_notebook: default
  pdf_document: default
---

Table 1 example:

```{r, warning=FALSE, message=FALSE, echo=FALSE}
library(magrittr)
library(tidyverse)
library(modelsummary)
library(kableExtra)

tmp <- mtcars[, c("mpg", "hp")]

tmp$class <- 0
tmp$class[15:32] <- 1
tmp$class <- as.factor(tmp$class)

tmp$region <- "A"
tmp$region[15:20] <- "B"
tmp$region[21:32] <- "C"
tmp$region <- as.factor(tmp$region)

## change position of varianbles
tmp <- tmp[,c("mpg","class","region","hp")]

# create a list with individual variables
# remove missing and rescale
tmp_scaled <- tmp
tmp_scaled$mpg <- scale(tmp_scaled$mpg)
tmp_scaled$hp <- scale(tmp_scaled$hp)
tmp_scaled$class2 <- tmp_scaled$class
tmp_scaled$region2 <- tmp_scaled$region
tmp_scaled$region3 <- tmp_scaled$region

tmp_scaled <- tmp_scaled[,c("mpg","class","class2","region","region2","region3","hp")]

tmp_scaled_list <- lapply(tmp_scaled, na.omit)

tmp_scaled_list[2] <- list(NULL)
tmp_scaled_list[3] <- list(NULL)
tmp_scaled_list[4] <- list(NULL)
tmp_scaled_list[5] <- list(NULL)
tmp_scaled_list[6] <- list(NULL)

N_alt <- function(x) paste0(N(x), ' (', round((as.numeric(N(x))/32)*100,digits=1), ')')

# create a table with `datasummary`
emptycol = function(x) " "
datasummary(mpg + class + region + hp ~ Heading("N (%)") * N_alt + Mean + SD + Heading("Boxplot") * emptycol + Heading("Histogram") * emptycol, data = tmp) %>%
  column_spec(column = 6, image = spec_boxplot(tmp_scaled_list)) %>%
  column_spec(column = 7, image = spec_hist(tmp_scaled_list))


```

The output is now correct in terms of boxplot/histogram:

enter image description here