9

Possible Duplicate:
How to write an R function that evaluates an expression within a data-frame

I want to write a function that sorts a data.frame -- instead of using the cumbersome order(). Given something like

> x=data.frame(a=c(5,6,7),b=c(3,5,1))
> x
  a b
1 5 3
2 6 5
3 7 1

I want to say something like:

sort.df(x,b)

So here's my function:

sort.df <- function(df, ...) {
  with(df, df[order(...),])
}

I was really proud of this. Given R's lazy evaluation, I figured that the ... parameter would only be evaluated when needed -- and by that time it would be in scope, due to 'with'.

If I run the 'with' line directly, it works. But the function doesn't.

> with(x,x[order(b),])
  a b
3 7 1
1 5 3
2 6 5
> sort.df(x,b)
Error in order(...) : object 'b' not found

What's wrong and how to fix it? I see this sort of "magic" frequently in packages like plyr, for example. What's the trick?

Community
  • 1
  • 1
dk.
  • 2,030
  • 1
  • 22
  • 22
  • sort.df(x, x$b) works, but still I have no idea why sort.df(x,b) does not work – Ali Oct 11 '12 at 18:20
  • 1
    See also `plyr::arrange` which does exactly this. – hadley Oct 12 '12 at 14:22
  • 1
    Thanks! I didn't know about arrange despite using plyr every day. Yet another example that it's hard to find the right solutions in the R world -- and so much of good R programming is learning best practices using a few good packages. – dk. Oct 12 '12 at 18:27

2 Answers2

9

This will do what you want:

sort.df <- function(df, ...) {
  dots <- as.list(substitute(list(...)))[-1]
  ord <- with(df, do.call(order, dots))
  df[ord,]
}

## Try it out
x <- data.frame(a=1:10, b=rep(1:2, length=10), c=rep(1:3, length=10))
sort.df(x, b, c)

And so will this:

sort.df2 <- function(df, ...) {
    cl <- substitute(list(...))
    cl[[1]] <- as.symbol("order")
    df[eval(cl, envir=df),]
}
 sort.df2(x, b, c)
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • 1
    Or `sort.df <- function(df, ...) df[order(eval(substitute(...), df)),]` – Joshua Ulrich Oct 11 '12 at 18:23
  • @JoshuaUlrich -- Not quite the same. Yours will only end up sorting by the first element of `...`, since `substitute(...)` only captures that. (Put a `browser()` call in `sort.df()`, and then compare `substitute(...)` and `substitute(list(...))` to see what I mean.) – Josh O'Brien Oct 11 '12 at 18:27
7

It's because when you're passing b you're actually not passing an object. Put a browser inside your function and you'll see what I mean. I stole this from some Internet robot somewhere:

x=data.frame(a=c(5,6,7),b=c(3,5,1))

sort.df <- function(df, ..., drop = TRUE){
    ord <- eval(substitute(order(...)), envir = df, enclos = parent.frame())
    return(df[ord, , drop = drop])
}

sort.df(x, b)

will work.

So will if you're looking for a nice way to do this in an applied sense:

library(taRifx)
sort(x, f=~b)
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • 2
    +1 for the nice solution and, especially, for suggesting playing around with a `browser()` call inside the function. IMHO, that's far and away the best way to learn about `...` and all the oddness that surrounds it. – Josh O'Brien Oct 11 '12 at 18:56
  • Someone could correct me on this, but `enclos = parent.frame()` is default in `eval` so simply `eval(substitute(order(...)), envir = df)` also works :) – user1665355 May 20 '13 at 12:07