2

Consider the following R code:

y1 <- dataset %>% dplyr::filter(W == 1) 

This works, but there seems to some magic here. Usually, when we have an expression like foo(bar), we should be able to do this:

baz <= bar
foo(baz)

However, in the presented code snippet, we cannot evaluate W == 1 outside of dplyr::filter()! W is not a defined variable.

What's going on?

Adam Bethke
  • 1,028
  • 2
  • 19
  • 35
Yatharth Agarwal
  • 4,385
  • 2
  • 24
  • 53
  • 5
    `W` only exists in the scope of `dataset` - so you can evaluate `dataset$W == 1` in the same way. – thelatemail May 08 '18 at 00:54
  • 1
    If I'm understanding the question correctly, this is also related to non-standard evaluation. There's a good chapter on the subject: http://adv-r.had.co.nz/Computing-on-the-language.html – Adam Bethke May 08 '18 at 01:05
  • @AdamBethke Thank you! If you paste the relevant bits from that link into an answer, I'm happy to accept it. – Yatharth Agarwal May 08 '18 at 03:56

1 Answers1

2

dplyr uses a concept called Non-standard Evaluation (NSE) to make columns from the data frame argument accessible to its functions without quoting or using dataframe$column syntax. Basically:

[Non-standard evaluation] is a catch-all term that means they don’t follow the usual R rules of evaluation. Instead, they capture the expression that you typed and evaluate it in a custom way.1

In this case, the custom evaluation takes the argument(s) given to dplyr::filter, and parses them so that W can be used to refer to the dataset$W. The reason that you can't then take that variable and use it elsewhere is that NSE is only applied to the scope of the function.


NSE makes a trade-off: functions which modify scope are less safe and/or unusable in programming where you're building a program that uses functions to modify other functions:

This is an example of the general tension between functions that are designed for interactive use and functions that are safe to program with. A function that uses substitute() might reduce typing, but it can be difficult to call from another function.2

For example, if you wanted to write a function which would use the same code, but swap out W == 1 for W == 0 (or some completely different filter), NSE would make that more difficult to accomplish.

In 2017 the tidyverse started to build a solution to this in tidy evaluation.

Adam Bethke
  • 1,028
  • 2
  • 19
  • 35