1

I am learning about programming with tidy evaluation and non-standard evaluation and have been trying to work out how to constrain the possible states of an argument in a function.

For instance given a data set:

set.seed(123)
data <- data_frame(GROUP_ONE = sample(LETTERS[1:3], 10, replace = TRUE), 
                   GROUP_TWO = sample(letters[4:6], 10, replace = TRUE), 
                   result = rnorm(10))

I can create a function which has an argument I use to group the data using a quosure like so:

my_function <- function(data, group = GROUP_ONE){

  require(dplyr)
  require(magrittr)

  group <- enquo(group)

  result <- data %>% 
    group_by(!!group) %>% 
    summarise(mean=mean(result))

  return(result)
}

and this does what I want

my_function(data)

# A tibble: 3 x 2
  GROUP_ONE       mean
      <chr>      <dbl>
1         A  1.5054975
2         B  0.2817966
3         C -0.5129904

and I can supply a different group:

my_function(data, group = GROUP_TWO)

# A tibble: 3 x 2
  GROUP_TWO       mean
      <chr>      <dbl>
1         d -0.3308130
2         e  0.2352483
3         f  0.7347437

However, I cannot group by a column for which is not present in the data.

e.g.

 my_function(data, group = GROUP_THREE)

Error in grouped_df_impl(data, unname(vars), drop) : Column GROUP_THREE is unknown

I would like to add a step at the beginning of the function so that the function stops with a custom error message if the group argument is not GROUP_ONE or GROUP_TWO

something like:

if(!group %in% c(~GROUP_ONE, ~GROUP_TWO)) stop("CUSTOM ERROR MESSAGE")

except this does not work as you apparently you can't put quosures in a vector. It should be possible to convert the quosure to a string somehow and have a vector of strings but I can't figure out how.

How is this done?

Agaz Wani
  • 5,514
  • 8
  • 42
  • 62
G_T
  • 1,555
  • 1
  • 18
  • 34

2 Answers2

2

I think you need quo_name (from dplyr or rlang), which transforms a quoted symbol to a string:

my_function <- function(data, group = GROUP_ONE){

    require(dplyr)
    require(magrittr)

    group <- enquo(group)

    if(!quo_name(group) %in% names(data)) stop("CUSTOM ERROR MESSAGE")

    result <- data %>% 
        group_by(!!group) %>% 
        summarise(mean=mean(result))

    return(result)
}

# > my_function(data, GROUP_THREE)
# Error in my_function(data, GROUP_THREE) : CUSTOM ERROR MESSAGE

Edit

As noted by lionel in comment: except for quo_name, there are many other alternatives including base R as.character and as_string from rlang.

mt1022
  • 16,834
  • 5
  • 48
  • 71
  • `quo_name()` is for transforming arbitrary expressions to text so that isn't robust for checking symbols. – Lionel Henry Aug 23 '17 at 06:09
  • @lionel. thanks for the note. I think the problem is not to check for symbol but to check whether the column exists in data. I see no condition that `quo_name` can not finish this job . In addition, I don't have to `library(rlang)` when using `quo_name`. – mt1022 Aug 23 '17 at 06:28
  • 1
    You never have to attach rlang, you can use equivalent base functions or namespace-qualified rlang functions. I think checking for symbols is part of the question here, because with your approach you're going to return an error about a symbol not found when people supply things like `as.factor(col)`, which is wrong. – Lionel Henry Aug 23 '17 at 06:52
  • @lionel, I see. `as.character` also works for this case. – mt1022 Aug 23 '17 at 06:54
1

quo_name() is for transforming arbitrary expressions to text so that isn't robust for checking symbols.

If you expect only symbols, and if those symbols should only represent data frames columns, you don't need quosures. In this case you can capture with enexpr() (and there will be ensym() in the next version of rlang):

group <- enexpr(group)
stopifnot(is_symbol(group))  # Or some custom error

Then turn it to a string for the check:

as_string(group) %in% names

You can then unquote the symbol just like you unquote the quosure.

df %>% group_by(!! group)

Alternatively if you need quosures you can check the contained expression:

expr <- get_expr(quo)
is_symbol(expr) && as_string(expr) %in% names

That should be the preferred UI because group_by() has mutate semantics, so you can do stuff like this: df %>% group_by(as.factor(col)). This also means that it's hopeless to try to provide custom error messages, unless you want to capture the error, parse it to make sure it's a "symbol not found" one, and rethrow another error.

Lionel Henry
  • 6,652
  • 27
  • 33
  • Thanks. Could you explain a little more about symbols? and specifically why they mean raw expressions are more robust in this context? I've read http://dplyr.tidyverse.org/articles/programming.html where is says `enquo()` is equivalent to `base::substitute()` but I've also just found http://rlang.tidyverse.org/reference/expr.html where `enexpr()` is also said to be equivalent to `base::substitute()`. – G_T Aug 23 '17 at 09:38
  • When I try `my_function <- function(data, group = GROUP_ONE){ group <- rlang::enexpr(group) as.factor(group) } my_function(data, GROUP_ONE)` I get an error `Error in unique.default(x, nmax = nmax) : unique() applies only to vectors` – G_T Aug 23 '17 at 09:39
  • You can work with symbols if you want, it's just that you're limiting `group_by()` functionality which supports expressions in addition to symbols: `group_by(mtcars, as.factor(cyl))`. If you choose to work with symbols that always refer to data frame objects you don't need quosures. If you want to work with arbitrary expressions you need them because they should be evaluated in the context of where they were typed. – Lionel Henry Aug 23 '17 at 10:51