5

I stumbled upon this behaviour and do not quite understand it. Could someone, please, shed some light?

I have written the following function which gives the following error:

> MyFilter <- function(data, filtersVector) {
    filtersVector <- quo(filtersVector)
    result <- data %>% filter(Species %in% !!filtersVector)
    result
  }

> MyFilter(iris, c("setosa", "virginica"))
Error in filter_impl(.data, quo) : 
Evaluation error: 'match' requires vector arguments.

However, if I modify it in the following way it is working as expected:

> MyFilter <- function(data, filtersVector) {
    otherName <- quo(filtersVector)
    result <- data %>% filter(Species %in% !!otherName)
    result
  }

> MyFilter(iris, c("setosa", "virginica"))
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
1            5.1         3.5          1.4         0.2    setosa
2            4.9         3.0          1.4         0.2    setosa
3            4.7         3.2          1.3         0.2    setosa
4            4.6         3.1          1.5         0.2    setosa
5            5.0         3.6          1.4         0.2    setosa
6            5.4         3.9          1.7         0.4    setosa

I realize also that in a function I should be using enqou instead and it works fine.

> MyFilter <- function(data, filtersVector) {
        filtersVector<- enquo(filtersVector)
        result <- data %>% filter(Species %in% !!filtersVector)
        result
      }

> MyFilter(iris, c("setosa", "virginica"))
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
1            5.1         3.5          1.4         0.2    setosa
2            4.9         3.0          1.4         0.2    setosa
3            4.7         3.2          1.3         0.2    setosa
4            4.6         3.1          1.5         0.2    setosa
5            5.0         3.6          1.4         0.2    setosa
6            5.4         3.9          1.7         0.4    setosa

However, I am still puzzled by the above behaviour, and any explanation will be appreciated.

deann
  • 756
  • 9
  • 24

2 Answers2

10

TLDR: In the first version, you have created a self-reference (a symbol that points to itself). The other versions work but you actually don't need quosures or capturing arguments here because you are not referring to data frame columns. This also explains why both the quo() and the enquo() versions work the same. You can just pass the argument in the normal way, without any quoting, though it's still a good idea to unquote with !! to avoid any data masking bug.

You can use qq_show() around the filter() call to discover the differences in syntax:

MyFilter <- function(data, filtersVector) {
  filtersVector <- quo(filtersVector)

  rlang::qq_show(
    result <- data %>% filter(Species %in% !!filtersVector)
  )
}

MyFilter(iris, c("setosa", "virginica"))
#> result <- data %>% filter(Species %in% (^filtersVector))

So here we are asking filter() to find the rows where Species matches the elements of filtersVector. There is no filtersVector column in your data frame, so it looks for a definition in the quosure environment. You have created a quosure with quo(), which records your expression (in this case a symbol filtersVector) and your envionment (the environment of your function). So it looks up for a filtersVector object, which contains a symbol referring to itself. It is evaluated only once so there is no infinite loop, but you're effectively trying to compare a vector to a symbol, which is a type error:

"setosa" %in% quote(filtersVector)
#> Error in match(x, table, nomatch = 0L) :
#> 'match' requires vector arguments

In your second try, you give another name to the quosure. It now works because filtersVector, in the environment of your function, still represent the argument that was passed to it (a vector).

In the third try, you use enquo() this time. Rather than capturing your expression and your environment, enquo() captures the expression and the environment of the user of your function. Let's use qq_show() again to see the difference:

MyFilter <- function(data, filtersVector) {
  filtersVector<- enquo(filtersVector)

  rlang::qq_show(
    data %>% filter(Species %in% !!filtersVector)
  )
}

MyFilter(iris, c("setosa", "virginica"))
#> data %>% filter(Species %in% (^c("setosa", "virginica")))

Now, the quosure contains a call that creates a vector on the spot, which %in% understands perfectly.

Note how you're not actually referring to data frame columns though. You're passing vectors. This means you don't need any quosure at all, and you don't need to capture the expression passed to an argument. enquo() is only useful to delay evaluation until the very end, so it can be evaluated within the data frame. If the quo() and enquo() versions produce teh same result, that's a good indication you don't need any quoting at all. Since there is no need for them, let's simplify the function by removing quosures of the equation:

MyFilter <- function(data, filtersVector) {
  data %>% filter(Species %in% filtersVector)
}

MyFilter(iris, c("setosa", "virginica"))
#> # A tibble: 100 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ... with 90 more rows

It works! But what happens if the data frame contains a filtersVector column though? It'd have precedence over the object from the environment:

iris %>%
  mutate(filtersVector = "parasite vector") %>%
  MyFilter(c("setosa", "virginica"))
#> # A tibble: 0 x 6
#> # ... with 6 variables: Sepal.Length <dbl>, Sepal.Width <dbl>,
#> #   Petal.Length <dbl>, Petal.Width <dbl>, Species <fct>, filtersVector <chr>

So it's still a good idea to unquote, because that will evaluate the vector right away and stick it inside the filter expression. It can no longer be masked by a column. The inlining is shown by qq_show():

MyFilter <- function(data, filtersVector) {
  rlang::qq_show(
    data %>% filter(Species %in% !!filtersVector)
  )
}
MyFilter(iris2, c("setosa", "virginica"))
#> data %>% filter(Species %in% <chr: "setosa", "virginica">)
Lionel Henry
  • 6,652
  • 27
  • 33
1

We need to use syms from rlang when we pass a quoted string of vectors instead of unquoted

MyFilter <- function(data, filtersVector) {
   filtersVector <- rlang::syms(filtersVector)
    data %>% 
      filter(Species %in% !!filtersVector)

 }

out <- MyFilter(iris, c("setosa", "virginica"))
dim(out)
#[1] 100   5
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I realize there is some tidyeval structure. However, I am interested in why the above does not work as expected. Is it sth connected with the function args being in another environment than the vars defined within the function or sth else? – deann Oct 30 '18 at 15:26
  • @deann First of all, I don't understand why you need `quo` or `enquo` or `sym` to pass a quoted vector. Simply `filter(Species %in% filtersVector)` would be enough. `enquo` is similar to replacement of `substitute, and both quo/enquo converts an object to quosure. But in this case, I don't see any need for those – akrun Oct 30 '18 at 15:32
  • Ok, I realize the example is stupid.. however I am interested in why does this happen or where to look for an explanation? – deann Oct 30 '18 at 15:47
  • @deann the way in which those functions are constructed i.e.`quo` calls `enquo` and if you look at `enquo` it is calling `rlang_quo` with `substitute`. These are functions used for a particular purpose. I am not sure whether it is tested widely for these kind of situations – akrun Oct 30 '18 at 15:49
  • This only works because of R's weird coercion rules. You're effectively doing this: `"setosa" %in% list(quote(setosa), quote(virginica)`. The `%in%` operator coerces the list elements to character because it doesn't know what to do with symbols. There is no need for symbols or quoting or expressions here, see my answer. – Lionel Henry Oct 30 '18 at 16:06
  • @lionel The part you said there is no need for symbols or quoting. I already said that in the comments – akrun Oct 30 '18 at 16:08
  • 1
    oops sorry I missed it – Lionel Henry Oct 30 '18 at 16:35