1

With the expss package for R, the following simple code gives the count of cars for each of the crossed values of 'cyl', 'gear', 'am' and 'vs'. Using a similar layout, would it be possible to replace this count by some statistic computed on a fifth variable (e.g. the median of 'mpg')?

mtcars %>% 
tab_cells(cyl) %>% 
tab_cols(vs, am) %>% 
tab_rows(gear) %>%
tab_stat_cases() %>% 
tab_pivot()
zx8754
  • 52,746
  • 12
  • 114
  • 209
Nicolas2
  • 2,170
  • 1
  • 6
  • 15

1 Answers1

1

If I correctly understand, you need:

mtcars %>% 
    tab_cells(mpg) %>% 
    tab_cols(vs, am) %>% 
    tab_rows(set_var_lab(gear, "gear") %nest% set_var_lab(cyl, "cyl")) %>%
    tab_stat_median() %>% 
    tab_pivot()

It gives:

 # |      |    |     |    |     |        |   vs |      |   am |      |
 # |      |    |     |    |     |        |    0 |    1 |    0 |    1 |
 # | ---- | -- | --- | -- | --- | ------ | ---- | ---- | ---- | ---- |
 # | gear |  3 | cyl |  4 | mpg | Median |      | 21.5 | 21.5 |      |
 # |      |    |     |  6 | mpg | Median |      | 19.8 | 19.8 |      |
 # |      |    |     |  8 | mpg | Median | 15.2 |      | 15.2 |      |
 # |      |  4 | cyl |  4 | mpg | Median |      | 25.9 | 23.6 | 28.9 |
 # |      |    |     |  6 | mpg | Median | 21.0 | 18.5 | 18.5 | 21.0 |
 # |      |    |     |  8 | mpg | Median |      |      |      |      |
 # |      |  5 | cyl |  4 | mpg | Median | 26.0 | 30.4 |      | 28.2 |
 # |      |    |     |  6 | mpg | Median | 19.7 |      |      | 19.7 |
 # |      |    |     |  8 | mpg | Median | 15.4 |      |      | 15.4 |

UPDATE:

  • tab_rows - row grouping variables
  • tab_cols - column grouping variables
  • tab_cells - variables on which we calculate statistics. It is rather natural when we calculate summary statistics such as median, mean and etc, but may be confusing when we calculate cases or column percent. You can get some docs by typing ?tab_cells in the console.

```

|         |   tab_cols     |
|tab_rows | stat(tab_cells)|

```

For count of cars:

mtcars %>% 
    tab_cells(mpg) %>% 
    tab_cols(vs, am) %>% 
    tab_rows(set_var_lab(gear, "gear") %nest% set_var_lab(cyl, "cyl")) %>%
    tab_stat_median() %>% 
    tab_stat_valid_n(label = "#Total") %>% 
    tab_pivot(stat_position = "inside_rows")

You can manage number of decimals with expss_digits() but it changes the number of decimals for entire table. Or, if you are using RStudio on Windows you can try expss_output_viewer() for output in RStudio viewer. In this case rows with "#" will be shown without decimals.

Gregory Demin
  • 4,596
  • 2
  • 20
  • 20
  • Thanks. I didn't understand the meaning of tab_cells. – Nicolas2 May 15 '18 at 12:27
  • Actually I also needed nesting on vs and am but it doesn't make an issue. Anyway, thanks. I didn't understand the meaning of tab_cells : is there some document that explains how to assemble the different functions? Examples are not enough to understand the logic. Additional question : if I want both the median of mpg and the count of cars (one following the other, and the count without decimals)? . – Nicolas2 May 15 '18 at 12:33
  • Thanks again. I eventually got rid of the unwanted decimals using a linear layout of the statistics,with tab_stat_fun(Median = w_median, N = w_n,method=list) – Nicolas2 May 15 '18 at 14:27