Problem
By using dplyr::summarize_at()
(or equivalent), I would like to get a table of summaries in which columns are sorted first by (G) order of grouping variables used, then by (V) order of variables passed and lastly by (F) order of functions applied. The default order is determined first by G, then by F and lastly by V.
Example
The code:
library(purrr)
library(dplyr)
q025 <- partial(quantile, probs = 0.025, na.rm = TRUE)
q975 <- partial(quantile, probs = 0.975, na.rm = TRUE)
vars_to_summarize <- c("height", "mass")
my_summary <- starwars %>%
filter(skin_color %in% c("gold", "green")) %>%
group_by(skin_color) %>%
summarise_at(vars_to_summarize, funs(q025, mean, q975))
Results in:
my_summary
## A tibble: 2 x 7
## skin_color height_q025 mass_q025 height_mean mass_mean height_q975 mass_q975
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 gold 167.000 75.0 167 75 167.00 75.0
## 2 green 79.375 22.7 169 NA 204.75 110.4
And the desired order of variables should be:
skin_color, height_q025, height_mean, height_q975, mass_q025, mass_mean, mass_q975
I would like to use something like this (naively simple) code:
my_summary %>%
select(everything(), starts_with(vars_to_summarize))
But it does not work. Even this code does not work as I expect (even though it's not a general solution I seek):
my_summary %>%
select(everything(),
starts_with(vars_to_summarize[1]),
starts_with(vars_to_summarize[2]))
Most probably everything()
should always be the last argument in select()
.
To generalize
Say, I have:
- N grouping variables ("gr_") that I pass to
group_by()
, - L variables ("var_") that have to be summarized and
- M summary functions to apply ("fun_").
In general, the desired order of variables in the summary table should follow the pattern:
gr_1, gr_2, ..., gr_N,
var_1_fun_1, var_1_fun_2, ..., var_1_fun_M,
var_2_fun_1, var_2_fun_2, ..., var_2_fun_M,
...,
var_L_fun_1, var_L_fun_2, ..., var_L_fun_M