8

In the current version of dplyr, select arguments can be passed by value:

variable <- "Species"
iris %>% 
    select(variable)

#       Species
#1       setosa
#2       setosa
#3       setosa
#4       setosa
#5       setosa
#6       setosa
#...

But group_by arguments cannot be passed by value:

iris %>% 
    group_by(variable) %>% 
    summarise(Petal.Length = mean(Petal.Length))

# Error in grouped_df_impl(data, unname(vars), drop) : 
# Column `variable` is unknown

The documented dplyr::select behaviour is

iris %>% select(Species)

And the documented documented dplyr::group_by behaviour is

iris %>% 
    group_by(Species) %>% 
    summarise(Petal.Length = mean(Petal.Length))
  • Why are select and group_by different with respect to passing arguments by value?
  • Why is the first select call working and will it continue to work in the future?
  • Why is the first group_by call not working? I'm trying to figure out what combination of quo(), enquo() and !! I should use to make it work.

I need this because I would like to create a function that takes a grouping variable as input parameter, if possible the grouping variable should be given as a character string, because two other function parameters are already given as character strings.

Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110
  • 1
    Isn't this part of the effort to [use tidy evaluation semantics instead of standard evaluation](http://dplyr.tidyverse.org/reference/se-deprecated.html)? – Dan Aug 14 '17 at 16:03
  • 1
    In `browseVignettes(package = "dplyr")`, you'll find one on programming, which covers what is/will be idiomatic, anyways. – Frank Aug 14 '17 at 16:05
  • I read the [dplyr vignette on programming](https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html) a few days ago, now reading the [rlang vignette on tidy evaluation](https://cran.r-project.org/web/packages/rlang/vignettes/tidy-evaluation.html). – Paul Rougieux Aug 15 '17 at 07:18
  • `group_by(get(variable))` should get it to work but not sure why `select` and `group_by` are different in this respect. – pentandrous Oct 26 '17 at 15:37

1 Answers1

5

To pass string as symbol or unevaluated code, you have to first parse it to symbol or quosure. You can use sym or parse_expr from rlang to parse and later use !! to unquote:

library(dplyr)

variable <- rlang::sym("Species")
# variable <- rlang::parse_expr("Species")

iris %>% 
  group_by(!! variable) %>% 
  summarise(Petal.Length = mean(Petal.Length))

!! is a shortcut for UQ(), which unquotes the expression or symbol. This allows variable to be evaluated only within the scope of where it is called, namely, group_by.

Difference between sym and parse_expr and which one to use when?

The short answer: it doesn't matter in this case.

The long answer:

A symbol is a way to refer to an R object, basically the "name" of an object. So sym is similar to as.name in base R. parse_expr on the other hand transforms some text into R expressions. This is similar to parse in base R.

Expressions can be any R code, not just code that references R objects. So you can parse the code that references an R object, but you can't turn some random code into sym if the object that it references does not exist.

In general, you will use sym when your string refers to an object (although parse_expr would also work), and use parse_expr when you are trying to parse any other R code for further evaluation.

For this particular use case, variable is supposed to be referencing an object, so turning it into a sym would work. On the other hand, parsing it as an expression would also work because that is the code that is going to be evaluated inside group_by when being unquoted by !!.

acylam
  • 18,231
  • 5
  • 36
  • 45
  • Thanks, using `rlang::sym` and `!!` I can pass the grouping variable as a character string. – Paul Rougieux Nov 16 '17 at 10:38
  • 1
    It took me a while to understand why `!!` is necessary. The [rlang vignette on tidy evaluation](https://cran.r-project.org/web/packages/rlang/vignettes/tidy-evaluation.html) gives an example that helped me understand: "[...]with quasiquotation: users can bypass symbolic evaluation completely by unquoting values. For instance, the following expressions are completely equivalent: # Taking an expression: `dplyr::mutate(mtcars, cyl2 = cyl * 2)` # Taking a value: `var <- mtcars$cyl * 2`; `dplyr::mutate(mtcars, cyl2 = !! var)"`." – Paul Rougieux Nov 16 '17 at 10:38
  • Thanks for the explanation. Do you know what the differences between `sym` and `parse_expr` is? Why should I prefer one to the other? – tmastny Mar 13 '18 at 17:00
  • 1
    @tmastny Added an explanation of the difference between the two. Hope this helps – acylam Mar 13 '18 at 17:20