10

I am trying to use data.table's .. notation with functions, here is the code I have so far:

set.seed(42)
dt <- data.table(
  x = rnorm(10),
  y = runif(10)
)

test_func <- function(data, var, var2) {
  vars <- c(var, var2)
  data[, ..vars]
}

test_func(dt, 'x', 'y') # this works

test_func2 <- function(data, var, var2) {
  data[, ..var]
}

test_func2(dt, 'x', 'y') # this works too

test_func3 <- function(data, var, var2) {
  data[, sum(..var)]
}

test_func3(dt, 'x', 'y') 
# this does not work
# Error in eval(jsub, SDenv, parent.frame()) : object '..var' not found

It seems data.table does not recognize .. once it's wrapped inside another function in j. I know I can use sum(get(var)) to achieve the results but I want to know I am using the best practice in most situation.

EKtheSage
  • 199
  • 6

1 Answers1

2

Parroting an answer to a different problem that works here as well. Not the prettiest solution, but variants on this have worked for me numerous times in the past.

Thanks @Frank for a non-parse() solution here!

I'm well familiar with the old adage "If the answer is parse() you should usually rethink the question.", but I have a hard time coming up with alternatives many times when evaluating within the data.table calling environment, I'd love to see a robust solution that doesn't execute arbitrary code passed in as a character string. In fact, half the reason I'm posting an answer like this is in hopes that someone can recommend a better option.

test_func3 <- function(data, var, var2) {
  expr = substitute(sum(var), list(var=as.symbol(var)))
  data[, eval(expr)]
}

test_func3(dt, 'x', 'y')
## [1] 5.472968

Quick disclaimer on hypothetical doomsday scenarios possible with eval(parse(...))

There are far more in depth discussions on the dangers of eval(parse(...)), but I'll avoid repeating them in full.

Theoretically you could have issues if one of your columns is named something unfortunate like "(system(paste0('kill ',Sys.getpid())))" (Do not execute that, it will kill your R session on the spot!). This is probably enough of an outside chance to not lose sleep over it unless you plan on putting this in a package on CRAN.


Update:

For the specific case in the comments below where the table is grouped and then sum is applied to all, .SDcols is potentially useful. The only way I'm aware of to make sure that this function would return consistent results even if dt had a column named var3 is to evaluate the arguments within the function environment but outside of the data.table environment using c().

set.seed(42)
dt <- data.table(
  x = rnorm(10),
  y = rnorm(10),
  z = sample(c("a","b","c"),size = 10, replace = TRUE)
)


test_func3 <- function(data, var, var2, var3) {
  ListOfColumns = c(var,var2)
  GroupColumn <- c(var3)
  dt[, lapply(.SD, sum), by= eval(GroupColumn), .SDcols = ListOfColumns]
}

test_func3(dt, 'x', 'y','z') 

returns

   z         x         y
1: b 1.0531555  2.121852
2: a 0.3631284 -1.388861
3: c 4.0566838 -2.367558
Matt Summersgill
  • 4,054
  • 18
  • 47
  • Or `sum(data[[var]])` at least for the OP's example. – Frank Feb 15 '18 at 19:18
  • @Frank that is a better solution for this simplistic use case, but I'm not sure it can be generalized to more complex cases, like grouping by `var2` in that function. Would you have any recommendations on [this question](https://stackoverflow.com/questions/48234064/getx-does-not-work-in-r-data-table-when-x-is-also-a-column-in-the-data-table)? – Matt Summersgill Feb 15 '18 at 19:32
  • @MattSummersgill, that's what I am trying to accomplish, `sum(var), sum(var2), group by var3`, and I want to return the results as a data.table. Right now I am using `data[, .(sum(get(var)), sum(get(var2)), by = var3]` – EKtheSage Feb 15 '18 at 19:47
  • @MattSummersgill, thanks! this is a very good solution. I also found that you could use `by = var3` directly without using `c()`. I am still hoping there's a `..` solution. – EKtheSage Feb 15 '18 at 21:41