11

Puzzle for the R cognoscenti: Say we have a data-frame:

df <- data.frame( a = 1:5, b = 1:5 )

I know we can do things like

with(df, a)

to get a vector of results.

But how do I write a function that takes an expression (such as a or a > 3) and does the same thing inside. I.e. I want to write a function fn that takes a data-frame and an expression as arguments and returns the result of evaluating the expression "within" the data-frame as an environment.

Never mind that this sounds contrived (I could just use with as above), but this is just a simplified version of a more complex function I am writing. I tried several variants ( using eval, with, envir, substitute, local, etc) but none of them work. For example if I define fn like so:

fn <- function(dat, expr) {
  eval(expr, envir = dat)
}

I get this error:

> fn( df, a )
Error in eval(expr, envir = dat) : object 'a' not found

Clearly I am missing something subtle about environments and evaluation. Is there a way to define such a function?

Prasad Chalasani
  • 19,912
  • 7
  • 51
  • 73

4 Answers4

12

The lattice package does this sort of thing in a different way. See, e.g., lattice:::xyplot.formula.

fn <- function(dat, expr) {
  eval(substitute(expr), dat)
}
fn(df, a)             # 1 2 3 4 5
fn(df, 2 * a + b)     # 3 6 9 12 15
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
  • +1, very nice (didn't think about substitute). The advantage of match.call is that you have all your arguments in a convenient list, which is why I use that one more often. But if you don't need the rest, substitute is indeed a very nice and easy way. – Joris Meys Jan 16 '11 at 19:36
  • Is there a way to pass multiple expressions in a list() or c() and evaluate each in a for loop for different data frames which are also stored in a list? I want the same functionality I just can't make it work for dataframes and expressions stored in list. – Blind0ne May 01 '17 at 18:02
  • @Blind0ne Start a new question describing your problem exactly. You can link back to this question. – Richie Cotton May 02 '17 at 19:52
10

That's because you're not passing an expression.

Try:

fn <- function(dat, expr) {
  mf <- match.call() # makes expr an expression that can be evaluated
 eval(mf$expr, envir = dat)
}

> df <- data.frame( a = 1:5, b = 1:5 )
> fn( df, a )
[1] 1 2 3 4 5
> fn( df, a+b )
[1]  2  4  6  8 10

A quick glance at the source code of functions using this (eg lm) can reveal a lot more interesting things about it.

Joris Meys
  • 106,551
  • 31
  • 221
  • 263
  • thanks, that's what I missing! And yes, I tried looking at functions like `subset`, and some others, to see how how they do it, but they were internals. Didn't think about `lm`, good point for future reference. – Prasad Chalasani Jan 13 '11 at 17:09
  • 1
    I think using substitute in this circumstance is more canonical. And I'm not sure lm is a good role model - at least make sure to read the standard non-standard evaluation rules. – hadley Jan 14 '11 at 01:17
  • @hadley: true. I just thought about `match.call()` and `lm()` because of the `data` argument. – Joris Meys Jan 14 '11 at 09:04
  • @Prasad Chalasani `subset` isn't internal function, write `subset.data.frame` in console and you'll see how it's made. – Marek Jan 14 '11 at 15:40
2

A late entry, but the data.table approach and syntax would appear to be what you are after. This is exactly how [.data.table works with the j, i and by arguments.

If you need it in the form fn(x,expr), then you can use the following

library(data.table)

DT <- data.table(a = 1:5, b = 2:6)

`[`(x=DT, j=a)

## [1] 1 2 3 4 5

 `[`(x=DT, j=a * b)
## [1]  2  6 12 20 30

I think it is easier to use in more native form

DT[,a]
## [1] 1 2 3 4 5

and so on. In the background this is using substitute and eval

mnel
  • 113,303
  • 27
  • 265
  • 254
-1

?within might also be of interest.

 df <- data.frame( a = 1:5, b = 1:5 ) 
 within(df, cx <- a > 3)
   a b    cx
 1 1 1 FALSE
 2 2 2 FALSE
 3 3 3 FALSE
 4 4 4  TRUE
 5 5 5  TRUE
mdsumner
  • 29,099
  • 6
  • 83
  • 91