Difficulty with user-defined function to perform operation on multiple variables in R

Question

I am doing t-tests on variables within a dataframe:

library(rstatix)

df <- data.frame(grouping = c(rep("left", 50), rep("right", 50)), 
                 var1 = (rnorm(100, mean=21, sd=3)))

var1_result <- df %>% 
  t_test(var1 ~ grouping, paired = TRUE, detailed = TRUE) %>% 
  rstatix::add_significance()

var1_result

I have this working with repeated lines of code for each variable, but would like to improve by calling a user-defined function instead. I tried

my_t_test <- function(dataset, parameter, grouping_variable) {
  parameter <- dataset %>% t_test({{parameter}} ~ {{grouping_variable}}, paired = TRUE, detailed = TRUE) %>% add_significance()
  return(parameter)
}
my_t_test(df, var1, grouping)

However, I am encountering the error: "Error in pull(): ! Can't extract columns that don't exist. ✖ Column ... doesn't exist."

I found a few posts that address calling df variables within a function written in dplyr style (e.g., How can I write a function in R which accepts column names like dplyr? & writing a scoped filter function in dplyr)

I tried the approach of writing my function with "..." instead as suggested by first post, but this did not work, and was having trouble generalizing any solutions from other posts. Very interested in learning more about proper notation and scoping with user-defined functions when using dplyr

score 0 · Accepted Answer · answered Jul 27 '23 at 18:40

You need to be a bit more careful when trying to put {{}} expressions into a formula since the left and right side of the formula are left unevaluated. One possible work around would be

my_t_test <- function(dataset, parameter, grouping_variable) {
  formula <- do.call("~", list(rlang::enexpr(parameter), rlang::enexpr(grouping_variable)))
  parameter <- dataset %>% t_test(formula, paired = TRUE, detailed = TRUE) %>% add_significance()
  return(parameter)
}

Here we call the ~ function to build the formula and use enexpr to capture the appropriate symbols.

This should produce the same output

my_t_test(df, var1, grouping)
# A tibble: 1 × 14
  estimate .y.   group1 group2    n1    n2 stati…¹     p    df conf.…² conf.…³ method
     <dbl> <chr> <chr>  <chr>  <int> <int>   <dbl> <dbl> <dbl>   <dbl>   <dbl> <chr> 
1    0.114 var1  left   right     50    50   0.162 0.872    49   -1.30    1.53 T-test
# … with 2 more variables: alternative <chr>, p.signif <chr>, and abbreviated
#   variable names ¹statistic, ²conf.low, ³conf.high
# ℹ Use `colnames()` to see all variable names

Note that {{}} is not a standard R syntax and only works for packages that use rlang as a back end (mainly those in the "tidyverse"). It just so happens that rstatix::t_test happens to use dplyr in the back end

This seems to work but there is some mild warning language at https://rlang.r-lib.org/reference/defusing-advanced.html about using "enexpr", saying: " enexpr() and enexprs() are like enquo() and enquos() but return naked expressions instead of quosures. These operators should very rarely be used because they lose track of the environment of defused arguments." Is there a risk of using this method? — marcel, Jul 28 '23 at 00:34
I’m not sure why type of “risk” you might be referring to. But I’m this case you are trying to pass those parameters as symbol names, you explicitly don’t want the calling environment because you want those symbols to be looked up in your data frame. — MrFlick, Jul 28 '23 at 02:57

Difficulty with user-defined function to perform operation on multiple variables in R

1 Answers1