I want to be able to construct function calls dynamically with varying grouping variables/arguments using dplyr. The number of function calls may be quite large, which means the examples in the programming with dplyr vignette are not practical. Ideally I want to be able to construct an object (e.g. a list) beforehand which stores the arguments/variables to be passed in each function call. Below is an example dataset, where we want to apply some summarising functions based on changing grouping variables.
set.seed(1)
df <- data.frame(values = sample(x = 1:10, size = 10),
grouping_var1 = sample(x = letters[1:2], size = 10, replace = TRUE),
grouping_var2 = sample(x = letters[24:26], size = 10, replace = TRUE),
grouping_var3 = sample(x = LETTERS[1:2], size = 10, replace = TRUE))
> df
values grouping_var1 grouping_var2 grouping_var3
1 9 a x B
2 4 a z B
3 7 a x A
4 1 a x B
5 2 a x A
6 5 b x A
7 3 b y B
8 10 b x A
9 6 b x B
10 8 a y B
Following the programming with dplyr vignette we could come up with a solution like this:
f <- function(df, ...){
group_var <- enquos(...)
df %>%
group_by(!!! group_var) %>%
summarise_at(.vars = "values", .funs = sum) %>%
print(n = 10)
}
> f(df, grouping_var1)
# A tibble: 2 x 2
grouping_var1 values
<fct> <int>
1 a 31
2 b 24
> f(df, grouping_var1, grouping_var2)
# A tibble: 5 x 3
# Groups: grouping_var1 [2]
grouping_var1 grouping_var2 values
<fct> <fct> <int>
1 a x 19
2 a y 8
3 a z 4
4 b x 21
5 b y 3
The example above is impractical and inflexible if I want to construct a large number of calls. Another limitation is that other information I may wish to include cannot easily be passed together or in addition to the grouping variables.
Assume I have a list containing grouping variables I want to pass in each function call. Assume also for each of those list elements there is a separate field with an "id" describing the grouping which was performed. See below for an example:
list(group_vars = list(c("grouping_var1"),
c("grouping_var1", "grouping_var2"),
c("grouping_var1", "grouping_var3")),
group_ids = list("var_1",
c("var_1_2"),
c("var_1_3")))
How do I dynamically pass these lists of arguments/variables and ids to function calls and have them be successfully evaluated using dplyr? Let's say I want to create a column in the resulting dataframe which aside from the summarised data also contains the group_ids. For example if my group_vars
were c("grouping_var1", "grouping_var2")
and the group_ids
was "var_1_2"
for a specific function call I would expect the output:
# A tibble: 5 x 4
# Groups: grouping_var1 [2]
grouping_var1 grouping_var2 values group_ids
<fct> <fct> <int> <chr>
1 a x 19 var_1_2
2 a y 8 var_1_2
3 a z 4 var_1_2
4 b x 21 var_1_2
5 b y 3 var_1_2
I am hoping to see a solution implementing this without using the nowadays deprecated group_by_
functions which accept strings.
On an ending note, I feel it is rather discouraging that programming with dplyr in functions using NSE has such a barrier to entry. Anytime I get stuck with something that should be simple it takes hours to find a solution.