Iterate over specified columns for crosstabs in R

Question

I am looking to run a couple of dozen crosstabs within the same dataset and with a set outcome variable. I have a function that gives me the crosstabs I want:

second_table = function(dat, variable1, variable2){
  
  dat %>% 
  tabyl({{variable1}}, {{variable2}}, show_na = FALSE) %>% 
  adorn_percentages("row") %>% 
  adorn_pct_formatting(digits = 1) %>% 
  adorn_ns() 
  
}

Using the mtcars dataset as an example, the function gives me what I want for a single variable:

cars = datasets::mtcars

second_table(cars, cyl, vs)

What I really want, though, is to create lots of these tables where the dat = cars and variable2 = vs arguments stay the same, but using several different columns as the variable1 argument. For the purposes of this example, say it's the following 4 variables:

variables = c("cyl", "am", "gear", "carb")

I'm not sure if a map function from the purrr package is the best way to do this, but I've been unsuccessfully trying all sorts of different things with map and related functions like map_at. If there is a way to do this with purrr then that's what I'd prefer to do, but I'm open to any suggestions. I don't really care what the output looks like, just that I can get all the crosstabs I need without copying and pasting code lots of times.

Any help is greatly appreciated!

I guess this only works with 3 variables at most i.e. according to `?tabyl` `Specify a data.frame and the one, two, or three unquoted column names you want to tabulate. Three variables generates a list of 2-way tabyls, split by the third variable.`. So, if you have more than 3, may be use `ftable` i.e. `ftable(cars[variables])` — akrun, Mar 23 '21 at 16:38
The accepted answer is more what I was looking for but I wasn't aware of the ftable function! — Emily Halford, Mar 23 '21 at 16:50
I was not sure whether you need to do this individually or together — akrun, Mar 23 '21 at 16:51

score 7 · Accepted Answer · answered Mar 23 '21 at 16:46

Since your dataset and second variable are fixed you can simplify the process like so:

library(tidyverse)
library(janitor)

imap(set_names(c("cyl", "am", "gear", "carb")), ~ mtcars %>%
       tabyl(!!rlang::sym(.x), vs, show_na = F) %>% 
       adorn_percentages("row") %>% 
       adorn_pct_formatting(digits = 1) %>% 
       adorn_ns() 
)

Output

$cyl
 cyl           0          1
   4   9.1%  (1) 90.9% (10)
   6  42.9%  (3) 57.1%  (4)
   8 100.0% (14)  0.0%  (0)

$am
 am          0         1
  0 63.2% (12) 36.8% (7)
  1 46.2%  (6) 53.8% (7)

$gear
 gear          0          1
    3 80.0% (12) 20.0%  (3)
    4 16.7%  (2) 83.3% (10)
    5 80.0%  (4) 20.0%  (1)

$carb
 carb          0          1
    1   0.0% (0) 100.0% (7)
    2  50.0% (5)  50.0% (5)
    3 100.0% (3)   0.0% (0)
    4  80.0% (8)  20.0% (2)
    6 100.0% (1)   0.0% (0)
    8 100.0% (1)   0.0% (0)

I used purrr::imap and purrr::set_names (technically from the rlang package) to preserve the variable names in the output list.

This is exactly what I was looking for - thank you. And thanks for explaining what set_names is doing! — Emily Halford, Mar 23 '21 at 16:49

score 4 · Answer 2 · answered Mar 23 '21 at 17:47

If you want to reuse your function, you have to make a small change:

library(rlang)

second_table2 = function(dat, variable1, variable2){
  variable1 <- sym(variable1)
  
  dat %>% 
    tabyl(!!variable1, {{variable2}}, show_na = FALSE) %>% 
    adorn_percentages("row") %>% 
    adorn_pct_formatting(digits = 1) %>% 
    adorn_ns() 
  
}

I checked this works well and maybe has better readability:

R> map(variables, ~second_table2(cars, .x, vs))
[[1]]
 cyl           0          1
   4   9.1%  (1) 90.9% (10)
   6  42.9%  (3) 57.1%  (4)
   8 100.0% (14)  0.0%  (0)

[[2]]
 am          0         1
  0 63.2% (12) 36.8% (7)
  1 46.2%  (6) 53.8% (7)

[[3]]
 gear          0          1
    3 80.0% (12) 20.0%  (3)
    4 16.7%  (2) 83.3% (10)
    5 80.0%  (4) 20.0%  (1)

[[4]]
 carb          0          1
    1   0.0% (0) 100.0% (7)
    2  50.0% (5)  50.0% (5)
    3 100.0% (3)   0.0% (0)
    4  80.0% (8)  20.0% (2)
    6 100.0% (1)   0.0% (0)
    8 100.0% (1)   0.0% (0

Of course, you can use @LMc 's great recommendations to make this more informative.

HTH.

Thanks for this answer, it's really useful to see how I could have used my original function! — Emily Halford, Mar 23 '21 at 18:08

Iterate over specified columns for crosstabs in R

2 Answers2

Linked