5

I am having trouble running non-standard evaluation (nse) expressions with the tidyr package.

Basically, what I want to do is to expand two columns that may be identical or not to achieve a dataframe with all possible combinations. The problem is that this will be a function, so I will not know the column name in advance.

Here it is a minimum example:

library(tidyr)

dummy <- data.frame(x = c("ex1", "ex2"), y = c('cat1', 'cat2')) # dataset

tidyr::expand(dummy, x, y) # using standard evaluation works
tidyr::expand_(dummy, c("x", "y"))  # using the deprecated syntax works

# The following did not work:

  tidyr::expand(dummy, one_of('x'), y) # using select syntax
  tidyr::expand(dummy, vars('x', 'y')) # mutate_at style
  tidyr::expand(dummy, .data[[cnae_agg]], .data[[cnae_agg]])  # mutate current style  
  tidyr::expand(dummy, sym('x'), sym('y')) # trying to convert to symbols
  tidyr::expand(dummy, !!!enquos('x', 'y')) 
  tidyr::expand(dummy, !!('x'), y) # unquosure just one element
  tidyr::expand(dummy, !!!c("x", "y")) # unquosure vector of strings
  tidyr::expand(dummy, !!!c(quo("x"), quo("y"))) # unquosure vector that is being quosured before

So, I have two questions:

1) What is the correct syntax to be applied with the tidyr expand function?

2) I probably read the Advanced R chapter on quasiquotation several times already, but it is still not clear to me why there are several different 'styles' to use nse with the tidyverse, and where exactly to use each.

I can basically throw pretty much anything to select/summarise that it will work, but when using mutate things react differently.

For example:

  # mutate
  mutate(dummy, new_var = .data[['x']]) # mutate basic style
  mutate(dummy, new_var = !!'x') # this just attributes 'x' to all rows


  # mutate at
  mutate_at(dummy, .vars=vars('y'), list(~'a')) # this works
  mutate_at(dummy, .vars=vars(!!'y'), list(~'a')) # this also works
  mutate_at(dummy, .vars=vars('y'), list(~`<-`(.,!!'x'))) # if we try to use unquote to create an attribution it does not work
  mutate_at(dummy, .vars=vars('y'), list(~`<-`(.,vars(!!'x')))) # even using vars, which works for variable selection, doesnt suffice

  # select 
  select(dummy, x) # this works
  select(dummy, 'x') # this works
  select_at(dummy, vars(!!'x')) # this works
  select_at(dummy, 'x') # this works
  select_at(dummy, !!'x') # this doesnt work

Which brings me to my 2) question.

Is there an updated guide with all the current syntaxes for the tidyverse style focusing on the differences in usage for each 'verb', such as in 'mutate' vs 'select' (i.e. when one works and the other doesn't)?

And how to know if I have to use the mutate or the select style of nse in other tidyverse packages, such as tidyr?

Elijah
  • 414
  • 3
  • 8
  • Not clear `mutate_at(dummy, .vars=vars('y'), list(~`<-`(.,!!'x')))` about the logic here. You are selecting a column 'y' and then is it assiging to a different column? In that case you can do that in a separate step with `rename` or `rename_at` – akrun Sep 23 '19 at 20:38
  • I agree that it would certainly be better to do that. I am just highlighting some operations that you could do with the usual mutate but which gets very confusing when using nse and the different flavors. For example, to make attribution with mutate is easy: mutate(dummy, x = y), but doing that using mutate_at and nse seems hard. – Elijah Sep 23 '19 at 20:47

2 Answers2

4

We need to evaluate (!!) the symbols

tidyr::expand(dummy,  !!! syms(c('x', 'y')))
# A tibble: 4 x 2
#  x     y    
#  <fct> <fct>
#1 ex1   cat1 
#2 ex1   cat2 
#3 ex2   cat1 
#4 ex2   cat2 

This would be particularly useful when the column names are stored in a vector and want to do the expand

nm1 <- c('x', 'y')
tidyr::expand(dummy, !!! syms(nm1))

IN some of the other combinations, either the !!! or the conversion to symbol is missing from the character vector

akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks Akrun! That solved it for me. I need some more clarification, though. Which are the cases where I have first to convert strings to symbols (using the `sym` terminology)? For example, I dont need to do that with `mutate`, but I recall doing something like that previously with `as.name` using `filter`. – Elijah Sep 23 '19 at 20:24
  • @Elijah If you are passing column names in a function, you can just pass the unquoted column name in `mutate/summarise` Within in `mutate_at`, pass the strings or unquoted within `vars` – akrun Sep 23 '19 at 20:25
  • @Elijah Some of the functions got deprecated. `filter` also have different flavors with `filter_at`, `filter_all` etc. If you have a specific question it would be easier to answer because there are a lot of ways in which you can do the evaluation – akrun Sep 23 '19 at 20:26
  • Thanks Akrun. I need to meditate about the answers a bit more, though. I have three different versions of the `dplyr` syntax in my head, beginning with the old `lazyeval` way, so it is not clear to me yet what can I do and what I can't with what I know about the latest syntax. – Elijah Sep 23 '19 at 20:50
  • @Elijah Forget about the lazyeval old versions and now concentrate on the `quo/enquo/sym/ensym` etc. The ones with prefix `en` are used when you pass variables to a function – akrun Sep 23 '19 at 20:58
3

The updated guide on nse is the tidy evaluation guide. In particular, chapter 8 covers its relationship with dplyr, along with general patterns. In your case, there are several possible patterns, depending on what you want to expose to the user.

Pattern 1: Simply pass the dots to expand, giving the user full control of the underlying expand():

f <- function(...) {tidyr::expand(dummy, ...)}
f( x, y )    # End users specifies the columns via NSE

Pattern 2: Capture the user's input on a per-variable basis and pass it to expand() using the new "curly curly" operator:

g <- function( var1, var2 ) {tidyr::expand(dummy, {{var1}}, {{var2}})}
g( x, y )    # Once again, NSE, but the number of arguments is controlled

Pattern 3: Allow the user to provide arguments as variable names OR strings. Use rlang::ensyms to convert strings to variable names:

h <- function(...) {tidyr::expand(dummy, !!!rlang::ensyms(...))}

# The interface now works with strings or NSE
h( "x", "y" )
h( x, y )

Pattern 3b: If you want to disable NSE support, and enforce that the users supply arguments as strings only, then a minor modification of the above pattern will accept strings only:

h2 <- function(...) {tidyr::expand(dummy, !!!rlang::syms(list(...)))}
h2( "x", "y" )    # Strings OK
h2( x, y )        # Error: object 'x' not found

Note that NSE functions require quasiquotation to handle symbols stored inside external variables:

# Handling strings in external variables
str_name <- "x"
h( !!str_name, "y" )
h2( str_name, "y" )    # h2 doesn't support NSE; no !! needed

# Handling variable names as unevaluated expressions (NOT strings)
var_name <- quote(y)
f( x, !!var_name )
g( x, !!var_name )
h( x, !!var_name )

# Handling lists of variable names using !!! unquote-splice
# Works with functions that accept dots
arg_names <- rlang::exprs( x, y )
f( !!!arg_names )
h( !!!arg_names )
Artem Sokolov
  • 13,196
  • 4
  • 43
  • 74
  • +1 for the link regarding tidyevaluation. Thanks for your answer Artem. However, I found something strange. If I try to pass the variable name to a string first, such as `str_name <- "x"`, none of the methods you showed seem to work. However, if I do ` i <- function(var1, var2){ tidyr::expand(dummy, !!sym(var1), !!sym(var2)) }` It does work. – Elijah Sep 23 '19 at 20:40
  • 1
    That's because the function is looking for a column called `str_name`. Use the `!!` operator to tell the functions to look for the column name stored *inside* `str_name` instead. Please see my edit. – Artem Sokolov Sep 23 '19 at 20:48
  • @Elijah: Added a "strings only" pattern, just in case. – Artem Sokolov Sep 23 '19 at 20:58
  • Thanks, I see now what I was missing before! – Elijah Sep 24 '19 at 00:14