I've been reading through programming with dplyr and trying to apply the ideas it describes in my work. I have something that works, but it's unclear to me whether I've done it in the "right" way. Is there something more elegant or concise I could be doing?
I have a tibble
where rows are scenarios and columns relate to tests that were run in that scenario. There are two types of columns, those that store a test statistic that was computed in that scenario and those that store the degrees of freedom of that test.
So, here's a small, toy example of the type of data I have:
library(tidyverse)
set.seed(27599)
my_tbl <- data_frame(test1_stat = rnorm(12), test1_df = rep(x = c(1, 2, 3), times = 4),
test2_stat = rnorm(12), test2_df = rep(x = c(1, 2, 3, 4), times = 3))
I want to compute a summary of each test that will be based on both its stat
and its df
. My example here is that I want to compute the median stat
for each group, where groups are defined by df
. The groupings are not guaranteed to be the same across tests, nor are the number of groups even guaranteed to be the same.
So, here's what I've done:
get_test_median = function(df, test_name) {
stat_col_name <- paste0(test_name, '_stat')
df_col_name <- paste0(test_name, '_df')
median_col_name <- paste0(test_name, '_median')
df %>%
dplyr::group_by(rlang::UQ(rlang::sym(df_col_name))) %>%
dplyr::summarise(rlang::UQ(median_col_name) := median(x = rlang::UQ(rlang::sym(stat_col_name)), na.rm = TRUE))
}
my_tbl %>% get_test_median(test_name = 'test1')
my_tbl %>% get_test_median(test_name = 'test2')
This works. But is it how an experienced rlang
user would do it? I am new to NSE, and a bit surprised to be using two nested rlang
functions repeatedly (UQ(sym(.))
).
I am happy using UQ
rather than !!
, just because I'm more comfortable with traditional function notation.
Based on the comments, I got rid of the namespace::function
notation and now my function doesn't look so verbose:
get_test_median = function(df, test_name) {
stat_col_name <- paste0(test_name, '_stat')
df_col_name <- paste0(test_name, '_df')
median_col_name <- paste0(test_name, '_median')
df %>%
dplyr::group_by(UQ(sym(df_col_name))) %>%
dplyr::summarise(UQ(median_col_name) := median(x = UQ(sym(stat_col_name)), na.rm = TRUE))
}