3

I'd like to create large crosstables with summary statistics of multiple row variables by multiple column variables - and found two packages that make it very easy to beautiful large tables: the tables package by Duncan Murdoch and the expss package by Gregory Demin (which does amazing things beside tables). There are some other packages, as moonBook (which works together with ztable package by the same author), but as far as I saw most of them lack something I need:

I want to ...

  1. (reproducibly) create large summary tables
  2. use case weights for these summary statistics
  3. use variable and value labels for the variables
  4. create both html and pdf tables without too much effort (w/o changing function options/chunk options).

ad 1) Both tables and expss make it easy to create complex tables with multiple row and col vars. As an example we produce a summary table of the iris data with tables and the table functions in expss.

library(magrittr)  # Pipes %>% 

data(iris)  # Use iris data for examples

## Tables with the `tables` packages
library(tables)
tabular( (Species + 1) ~ (n=1) + Format(digits=2)*(Sepal.Length + Sepal.Width)*(mean + sd + Format(digits=0)*(n=1)), data=iris )

## Tables with the `expss` package
library(expss)
iris %>%
  tab_cells(Sepal.Length, Sepal.Width) %>% 
  tab_cols(Species, total()) %>% 
  tab_stat_mean_sd_n() %>% 
  tab_transpose() %>% 
  tab_pivot() 

ad 2) With expss it's easy to use labels

iris_raw <- iris  # Save iris for later without any labels (they cause some problems with tabular)

iris <- apply_labels(iris,
                     Sepal.Length = "Sepal Length",
                     Sepal.Width = "Sepal With",
                     Species = "Species of Iris", 
                     Species = c("Setosa" = "setosa",
                                 "Versicolor" = "versicolor",
                                 "Virginica" = "virginica"))
str(iris)  # we can see the labels

library(expss)
expss_digits(digits = 2)
iris %>%
  tab_cells(Sepal.Length, Sepal.Width) %>% 
  tab_cols(Species, total()) %>% 
  tab_stat_mean_sd_n() %>% 
  tab_transpose() %>% 
  tab_pivot()

ad 3) For surveys we often need case weights, e.g. sampling design weights, nonresponse weights, post-stratification weights - that determine how much weight a single case has in the computation of descriptive statistics and in for model estimation. expss makes it possible to use case weights just by adding one single line: tab_weight(caseweight)

set.seed(123)  # Make following random numbers reproducible
iris$caseweight <- rnorm(nrow(iris), mean = 1, sd = .5) %>% abs() # Add some positive random caseweight to each case

library(expss)
expss_digits(digits = 2)
iris %>%
  tab_cells(Sepal.Length, Sepal.Width) %>% 
  tab_cols(Species, total()) %>% 
  tab_weight(caseweight) %>%  # Weight the cases
  tab_stat_mean_sd_n(weighted_valid_n = TRUE) %>% 
  tab_last_round(digits = 2) %>% 
  tab_transpose() %>% 
  tab_pivot() 

In tables it's also possible to use (custom) functions for calculating weighted summary statistics, but not as easy as with expss (I might be wrong here - if that's the case, please correct me).

ad 4) Coming to the last point of my wishlist: both html and pdf tables. Now that's easy with tables and much harder with expss. In tables the function toKable() and piping the output to the kableExtra for further refinement is the key.

library(tables)
tab <- tabular( (Species + 1) ~ (n=1) + Format(digits=2)*(Sepal.Length + Sepal.Width)*(mean + sd + Format(digits=0)*(n=1)), data=iris_raw)
tab %>% 
  toKable() %>%  # Translates the tabular-object to a kable-object
  kable_styling(full_width = TRUE) %>%  # Functions from kableExtra
  column_spec(2, color = "red")


# Remark: in expss it's possible to transform the (html-) table output to a data frame via `split_table_to_df()`, s. https://stackoverflow.com/questions/55542838/formatting-tables-in-r-markdown-to-export-to-ms-word-document/55576202#55576202
# But all the formatting gets lost - since it's just a df then.

library(expss)
expss_digits(digits = 2)
tab <- iris %>%
  tab_cells(Sepal.Length, Sepal.Width) %>% 
  tab_cols(Species, total()) %>% 
  tab_stat_mean_sd_n() %>% 
  tab_transpose() %>% 
  tab_pivot()

tab.df <- split_table_to_df(tab)  # Here the table 
str(tab.df)  # a df with the numbers and the labels
kable(tab.df)  # We could use kable on this (but the result does not look like a usual kable table) 

So each of the two packages has its superpowers: expss does an amazing job in creating tables with labels and case weights and tables makes it easy to use the table output from tabular() to create both html and pdf tables via via toKable, kable and kableExtra - and because the kableExtra package by Hao Zhu produces html or pdf depending on the type of document that gets knit - that's really simple, without changing any code (e.g. from "html" to "latex"), just by pressing Knit to pdf/html -- and that works perfectly well.

the rendered table for html

the rendered table in the pdf

Question: For an easy reproducible workflow it would be great to have all these features (1 to 4) at once and thus to combine expss with knitrExtra - are there any functions like toKable for tables from expss (or more general for html tables) that would make a refinement via kableExtra and easy html and pdf export without changing any options possible? Or are there other workflows to achieve 1 to 4? Thanks for your time and any hint!

gplngr
  • 77
  • 7
  • You can easily use custom functions in the`expss` with the `tab_stat_fun` function. If you use weights your custom function should have `weight` argument. As for pdf output - by now there is no an easy solution. It seems I will address this issue via the integration with `flextable` package but not in the nearest future. – Gregory Demin Aug 19 '19 at 20:46
  • Thanks! I'll try to build custom functions, but at the moment means, sds and frequencies are all I need. I played around with `split_table_to_df(tab)` and find it super helpful. My workflow now is: I build the table with the table functions of `expss` -> `split_table()` -> `kable()` -> `kableExtra` functions - and that works very smooth. Thanks for your great work @GregoryDemin – gplngr Aug 20 '19 at 10:46

1 Answers1

2

Another option is to use pander from the pander package. It recognises the data frame from split_table_to_df and produces a nice output for PDF documents.

  • Thanks - yes, absolutely: both `kable` and `pander` are wonderful for creating tables in R markdown. `kable()` is really simple and `kableExtra` is really great for editing tables, such as adding lines, highlighting cells and much more. `pander` is much more powerful out of the box and has built in functions for lots of object classes and seems to do much more than just tables (and it seems to work nicely together with `knitr::kable()`, s. https://cran.r-project.org/web/packages/pander/vignettes/knitr.html). I'll have a look at it. – gplngr Aug 28 '19 at 10:23