13

If I want do deparse the argument of a function for an error or a warning, something strange is happening if the argument is converted to a data.table within the function:

e <- data.frame(x = 1:10)
### something strange is happening
foo <- function(u) {
  u <- data.table(u)
  warning(deparse(substitute(u)), " is not a data.table")
  u
}
foo(e)

##  foo(e)
##      x
##  1:  1
##  2:  2
##  3:  3
##  4:  4
##  5:  5
##  6:  6
##  7:  7
##  8:  8
##  9:  9
## 10: 10
## Warning message:
## In foo(e) :
##   structure(list(x = 1:10), .Names = "x", row.names = c(NA, -10L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x10026568>) is not a data.table

If I deparse it before data.table everything works fine:

### ok
foo1 <- function(u) {
  nu <- deparse(substitute(u))
  u <- data.table(u)
  warning(nu, " is not a data.table")
  u
}
## foo1(e)
##      x
##  1:  1
##  2:  2
##  3:  3
##  4:  4
##  5:  5
##  6:  6
##  7:  7
##  8:  8
##  9:  9
## 10: 10
## Warning message:
## In foo1(e) : e is not a data.table

There is by the way no difference if e already is a data.table or not. I found it on purpose, when I was profiling some code, where deparse was very time consuming because e was quite big.

What's happening here and how can I handle such functions for data.frame and data.table input?

nachti

BrodieG
  • 51,669
  • 9
  • 93
  • 146
nachti
  • 1,086
  • 7
  • 20

2 Answers2

16

This is because substitute behaves differently when you are dealing with a normal variable instead of a promise object. A promise object is a formal argument and has a special slot that contains the expression that generated it. In other words, a promise object is a variable in a function that is part of the argument list of that function. When you use substitute on a promise object in a function, then it will retrieve the expression in the call to the function that was assigned to that formal argument. From ?substitute:

Substitution takes place by examining each component of the parse tree as follows: If it is not a bound symbol in env, it is unchanged. If it is a promise object, i.e., a formal argument to a function or explicitly created using delayedAssign(), the expression slot of the promise replaces the symbol. If it is an ordinary variable, its value is substituted, unless env is .GlobalEnv in which case the symbol is left unchanged.

In your case, you actually overwrite the original promise variable with a new one with:

u <- data.table(u)

at which point u becomes a normal variable that contains a data table. When you substitute on u after this point, substitute just returns the data table, which deparse processes back to the R language that would generate it, which is why it is slow.

This also explains why your second example works. You substitute while the variable is still a promise (i.e. before you overwrite u). This is also the answer to your second question. Either substitute before you overwrite your promise, or don't overwrite your promise.

For more details, see section 2.1.8 of the R Language Definition (promises) which I excerpt here:

Promise objects are part of R’s lazy evaluation mechanism. They contain three slots: a value, an expression, and an environment. When a function is called the arguments are matched and then each of the formal arguments is bound to a promise. The expression that was given for that formal argument and a pointer to the environment the function was called from are stored in the promise.

BrodieG
  • 51,669
  • 9
  • 93
  • 146
  • @nachti, does this not answer your question? – BrodieG May 15 '14 at 01:47
  • @[BrodieG](http://stackoverflow.com/users/2725969/brodieg): Thanks for the answer. As written above: How can I handle such functions for `data.frame` and `data.table` input? Should I copy it (needs a lot of space)? Or deparse everything first and then overwrite it? – nachti May 16 '14 at 14:41
  • 1
    @nachti, the latter, deparse first. Also, if you want to avoid copies you should consider using `setDT` instead of `data.table`. The former creates a data table by reference. – BrodieG May 17 '14 at 19:16
  • Perhaps use `delayedAssign("u", data.table(u))` in place of `u<-data.table(u)` in the first example? – Aaron McDaid Sep 16 '16 at 10:35
  • @AaronMcDaid I don't think that helps; did you try it? Desired output is likely `Warning: data.table(e) is not a data.table`. – BrodieG Sep 16 '16 at 12:03
  • You're right. I didn't test it. Thanks for checking! In my current tests, it deparses to `data.table(u)`, whereas I guess the questioner wants `data.table(e)` or simply `e`. – Aaron McDaid Sep 16 '16 at 12:46
0

You could probably do this with sprintf too, along with is.data.table.

> e <- data.frame(x = 1:10)
> foo <- function(u){
      nu <- deparse(substitute(u))
      if(!is.data.table(u)){
          warning(sprintf('%s is not a data table', nu))
          u
      } else {
          u
      }
  }
> foo(e)
    x
1   1
2   2
3   3
4   4
5   5
6   6
7   7
8   8
9   9
10 10
Warning message:
In foo(e) : e is not a data table
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • Thanks for your answers! @Richard: Good idea, but then it's still NOT a data.table and I can't use `:=` for example. It should be used within a function, which accepts either a `data.frame` or a `data.table` as input, do something in `data.table` syntax and convert it back to a `data.frame` if the input was one. BTW `u` is big (~250 MB) – nachti May 09 '14 at 13:52