3

I'm trying to use dplyr to filter based on a dynamic variable.

I've figured out that to get filter to work, I need to enclose the variable name in parentheses. However, if I program this into a fuction, it does not work properly.

df_ex <- data.frame(a = 1:10, b = 11:20)

param <- quo(a)

# returns df_ex with column a, only, as expected
df_ex %>%
dplyr::select(!!param)

# returns expected df
df_ex %>%
dplyr::filter((!!param)==5)

# Now for the function
testfun <- function(test_df, filt_var){
   filt_var_mod <- quo(filt_var)

   test_df %>%
    dplyr::filter((!!filt_var_mod)==5)
}

# returns empty df, not as expected
testfun(df_ex, "a")

I would like to learn to find the answers to these questions types of questions about dplyr for myself, so please feel free to refer me to the relevant part of the programming vignette

thelatemail
  • 91,185
  • 12
  • 128
  • 188
matsuo_basho
  • 2,833
  • 8
  • 26
  • 47

4 Answers4

5

If your function accepts column name as character, then there is no need to quote it, on the other hand you need to convert it to a symbol and evaluate them in the filter function immediately with UQ or !! in the nse syntax:

testfun <- function(test_df, filt_var){
    test_df %>%
        dplyr::filter((!!rlang::sym(filt_var)) == 5)
}

testfun(df_ex, "a")
#  a  b
#1 5 15

If you want to type the column names without quotes, then you need enquo, which

takes a symbol referring to a function argument, quotes the R code that was supplied to this argument, captures the environment where the function was called (and thus where the R code was typed), and bundles them in a quosure.

testfun <- function(test_df, filt_var){
    filt_var_mod <- enquo(filt_var)
    test_df %>%
        dplyr::filter((!!filt_var_mod) == 5)
}

testfun(df_ex, a)
#  a  b
#1 5 15
Psidom
  • 209,562
  • 33
  • 339
  • 356
3

Technically you don't need rlang or tidyeval or tibbles or dplyr for this kind of problem, base R leaves practically no sacred cows with what you can do using quote, eval, parse, and the other NSE tools that are baked in from the bottom up.

Edit: Much more elegant solution proposed by @thelatemail

df_ex <- data.frame(a = 1:10, b = 11:20)

testfun <- function(test_df, filt_var) {
  test_df[test_df[,filt_var] == 5,]
}    

testfun(df_ex, "a")

Returns

  a  b
5 5 15

Just for fun, a data.table option could work as well:

library(data.table)

df_ex <- data.frame(a = 1:10, b = 11:20)

testfun <- function(test_df, filt_var) {
  setDT(test_df,key = filt_var)[.(5)]
}

testfun(df_ex, "a")

Returns:

   a  b
1: 5 15
Matt Summersgill
  • 4,054
  • 18
  • 47
  • 3
    Or just `test_df[test_df[,filt_var] == 5,]` and do away with all this eval, parse, etc etc. Using `$` interactively is just asking for trouble - `fortunes::fortune(312)` – thelatemail Oct 26 '17 at 22:39
  • Ha! that's a far simpler solution, didn't know that was even possible! – Matt Summersgill Oct 26 '17 at 22:46
  • For anyone coming along later, `df_ex[which(eval(parse(text = paste0("test_df$",filt_var))) == 5),]` is the nonsense I had wrapped in the original body of the base-R function. – Matt Summersgill Oct 26 '17 at 22:49
  • 1
    Yep, `$` is just an interactive shortcut to `[[` essentially. Also, `fortune(106)` is relevant here - "*If the answer is parse() you should usually rethink the question.*" :-P – thelatemail Oct 26 '17 at 22:53
  • 1
    I actually really like the elegant base R solution proposed. I am however surprised that dplyr makes this so unnecessarily complex. – matsuo_basho Oct 27 '17 at 02:35
0

Sometimes the scoped versions verbs can be used in place of tideval when making simple functions.

For example, if you want to pass the name as a string in your function, filter_at is an option.

Below is what using filter_at looks like with your variable as a string. You pass the variables you want to filter on and then give the predicate function within either all_vars or any_vars. With filtering with a single variable it doesn't matter which of those you use.

filter_at(df_ex, "a", all_vars(. == 5) )

 a  b
1 5 15

filter_at can easily be used in a function.

testfun = function(test_df, filt_var){

    test_df %>%
        dplyr::filter_at(filt_var, all_vars(. == 5) )
}

testfun(df_ex, "a")

  a  b
1 5 15
aosmith
  • 34,856
  • 9
  • 84
  • 118
0

Base NSE appear to work, too:

testfun2 <- function(test_df, filt_var){
  filt_var_mod <- substitute(filt_var)
  test_df %>% 
    dplyr::filter(eval(filt_var_mod) == 5)
}

testfun2(df_ex, a)


  a  b
1 5 15
Sebastian Sauer
  • 1,555
  • 15
  • 24