Non standard evaluation of by in data.table

Question

I am lost with evaluation of by in data.table. What will be correct way to merge functionality of LJ and LJ2 into one function?

LJ <- function(dt_x_, dt_y_, by_)
{
    merge(
        dt_x_,
        dt_y_,
        by = eval(substitute(by_)), all.x = TRUE, sort = FALSE)
}
LJ2 <- function(dt_x_, dt_y_, by_)
{
    merge(
        dt_x_,
        dt_y_,
        by = deparse(substitute(by_)), all.x = TRUE, sort = FALSE)
}
LJ(
    data.table(A = c(1,2,3)),
    data.table(A = c(1,2,3), B = c(11,12,13)), 
    "A")
LJ2(
    data.table(A = c(1,2,3)),
    data.table(A = c(1,2,3), B = c(11,12,13)), 
    A)

I get the output same with both functions. It is not clear what you intend — akrun, Sep 08 '16 at 10:07
this isn't related to data.table, `merge.data.frame` will behave the same — jangorecki, Sep 08 '16 at 13:08

Roland · Accepted Answer · 2016-09-09T07:54:53.957

4

I consider this a bad idea. Have the user always pass a character value. You could do this:

LJ3 <- function(dt_x_, dt_y_, by_)
{ 
  by_ <- gsub('\"', "", deparse(substitute(by_)), fixed = TRUE)
  dt_y_[dt_x_, on = by_] 
}

LJ3(
  data.table(A = c(4,1,2,3)),
  data.table(A = c(1,2,3), B = c(11,12,13)), 
  A)
#   A  B
#1: 4 NA
#2: 1 11
#3: 2 12
#4: 3 13

LJ3(
  data.table(A = c(4,1,2,3)),
  data.table(A = c(1,2,3), B = c(11,12,13)), 
  "A")
#   A  B
#1: 4 NA
#2: 1 11
#3: 2 12
#4: 3 13

This question is not related to data.table. The by parameter in merge.data.table always expects a character value, as does on.

Edit: @eddi points out that the above will fail if you have column names with actual " in them (something you should avoid in general, but may happen if you fread some input files prepared by others).

An alternative that can handle such edge cases would be:

LJ4 <- function(dt_x_, dt_y_, by_)
{ 
  by_ <- substitute(by_)
  if (!is.character(by_)) by_ <- deparse(by_)
  dt_y_[dt_x_, on = by_] 
}

edited Sep 09 '16 at 07:54

answered Sep 08 '16 at 10:30

Roland

127,288
10
191
288

1

Fyi, `on` accepts `.()`-style args in the devel version, partly to make non-equi joins easier, I guess. Items 32 & 33 on the 1.9.7 news: https://github.com/Rdatatable/data.table/blob/master/NEWS.md – Frank Sep 08 '16 at 12:04
1

Thanks for the heads-up. I always refer to the CRAN version when answering. I need a somewhat stable version myself, so I don't even install devel versions, unless I need to test something for future compatibility. – Roland Sep 08 '16 at 12:53
3

This `gsub` approach will fail if there are actually quotes in the column name. I'd simply check if `substitute(by_)` is character, and only deparse if it isn't. – eddi Sep 08 '16 at 15:38
@eddi Thanks for the suggestion. – Roland Sep 09 '16 at 07:55

Non standard evaluation of by in data.table

1 Answers1