0

I am wondering if it is possible to use two columns to do a lookup in R data.table. Here is a little experiment that failed:

x <- data.table(A = c("a", "a", "b", "b", "c", "c"),
                D = c( 1,   2,   1,   2,   4,   5))
DT <- data.table(A = c("a", "a", "b", "b"),
                D = c( 1,   2,   1,   2))
setkey(DT, A, D)

DT[J(x$A, x$D), ]  # Same as below, thanks to ilir, I thought it returns an error previously
DT[J(x$A, x$D), , allow.cartesian=TRUE] 
# Return:
#    A D
# 1: a 1
# 2: a 2
# 3: b 1
# 4: b 2
# 5: c 4 # <- ideally (NA NA) or (c NA)
# 6: c 5 # <- ideally (NA NA) or (c NA)

In this experiment, rows in DT are unique, however, both columns have duplicated entries. When calling DT[J(x$A, x$D), ], what I want to do is to lookup table DT, thus I would expect the result only has entries in DT, however, this is clearly not the case.

Is there an effective way to do a lookup with two columns as keys?

yuez
  • 885
  • 1
  • 8
  • 16
  • You need to have the same key on both `data.tables`. I am sure the error read something to that effect. – ilir Apr 17 '14 at 13:23
  • All I want to do is to find positions of rows in x that's not in DT. I am sure you could have entries in x that are not in DT. I have another post regarding this - http://stackoverflow.com/questions/23087358/why-is-allow-cartesian-required-at-times-when-when-joining-data-tables-with-dupl – yuez Apr 17 '14 at 13:26
  • You should not get an error in the first case. `allow.cartesian` should only kick in when the result would have more than 6 lines (`max(nrow(x), nrow(DT))`). The other result is also what I would expect, the keys are always returned. If you had a third column in `DT`, that would come back as `NA` for the last two rows. – ilir Apr 17 '14 at 13:42

1 Answers1

1

Thanks to ilir, the following code works:

x <- data.table(A = c("a", "a", "b", "b", "c", "c"),
                D = c( 1,   2,   1,   2,   4,   5))
DT <- data.table(A = c("a", "a", "b", "b"),
                 D = c( 1,   2,   1,   2))
DT[, aux := 1L]
setkey(DT, A, D)
DT[J(x$A, x$D), ]

inx <- !is.na(DT[J(x$A, x$D), ]$aux)
Arun
  • 116,683
  • 26
  • 284
  • 387
yuez
  • 885
  • 1
  • 8
  • 16