Using dplyr inside a function with an argument named like a column in dataframe

Question

Could someone explain why f1 behaves differently than f2 in this example:

library(dplyr)

f1 <- function(data, year){
  data %>% 
    filter(year == year)
}

f2 <- function(data, y){
  data %>% 
    filter(year == y)
}

f3 <- function(data, year){
  data %>% 
    filter(!!year == year)
}


df <- data.frame(year = 2000:2005)

f1(df, 2005)
#>   year
#> 1 2000
#> 2 2001
#> 3 2002
#> 4 2003
#> 5 2004
#> 6 2005
f2(df, 2005)
#>   year
#> 1 2005
f3(df, 2005)
#>   year
#> 1 2005

I know this has something to do with tidy evaluation and I had a look at the vignette on Programming with dplyr. But the example here seems somewhat different.

I see that the problem can be fixed by using !! in f3, but I am not entirely sure what happens here. I would be interested to know if this is the optimal solution to the problem and if it is recommended to always use !! in similar situations.

It's because your argument has the same name than your columns, so in the first example you are doing filter(data, 2005 = 2005) — Maël, Oct 06 '22 at 08:46
`filter()` looks for column names first, and other variables afterwards. If it finds a match it won't keep looking. So in your `f1` it's saying "filter this data as long as the contents of the column 'year' matches the contents of the column 'year', which of course it does. In `f3` the `!!` can be thought of as specifying "the contents of the variable..." so it filters where "the contents of the column year matches the contents of the variable year". [link to technical explanation](https://adv-r.hadley.nz/quasiquotation.html#unquoting-one-argument) — Paul Stafford Allen, Oct 06 '22 at 08:53
Thanks a lot for your response. That's really useful. Is there way to make sure that the second `year` in `year == year` is interpreted explicitly as an env-variable and *not* as a data-variable? — Phil, Oct 06 '22 at 09:02

Using dplyr inside a function with an argument named like a column in dataframe

0 Answers0