33

I have a data.table with a logical column. Why the name of the logical column can not be used directly for the i argument? See the example.

dt <- data.table(x = c(T, T, F, T), y = 1:4)

# Works
dt[dt$x]
dt[!dt$x]

# Works
dt[x == T]
dt[x == F]

# Does not work
dt[x]
dt[!x]
tk421
  • 5,775
  • 6
  • 23
  • 34
djhurio
  • 5,437
  • 4
  • 27
  • 48

3 Answers3

35

From ?data.table

Advanced: When i is a single variable name, it is not considered an expression of column names and is instead evaluated in calling scope.

So dt[x] will try to evaluate x in the calling scope (in this case the global environment)

You can get around this by using ( or { or force

dt[(x)]
dt[{x}]
dt[force(x)]
Henrik
  • 65,555
  • 14
  • 143
  • 159
mnel
  • 113,303
  • 27
  • 265
  • 254
  • (+1) interesting use of `force` function. How does `force` work in this case? How does it alter environment/scope? – Nishanth Apr 24 '13 at 12:21
  • A bit more info on _why_ [here](http://r.789695.n4.nabble.com/Indexing-by-a-logical-column-tp4665153p4665142.html). – Matt Dowle Apr 24 '13 at 12:26
  • `force` basically stops it being intepreted as a single variable (this is done with some computing on the call within `[.data.table`) `force` then forces the evaluation of `x`, which will return `x` within the data.table scope. – mnel Apr 24 '13 at 12:28
  • 1
    @e4e5f5 `force` works just because it makes `i` not a single name anymore. `dt[identity(x)]` would work for the same reason, or just `dt[(x)]` is easiest. I'm kinda liking `(x)` on the LHS of `:=` too, instead of `with=FALSE`, so `(x)` is starting to become idiomatic `data.table` (although it's more by happy accident than by design). – Matt Dowle Apr 24 '13 at 12:31
4

x is not defined in the global environment. If you try this,

> with(dt, dt[x])
      x y
1: TRUE 1
2: TRUE 2
3: TRUE 4

It would work. Or this:

> attach(dt)
> dt[!x]
       x y
1: FALSE 3

EDIT:

according to the documentation the j parameter takes column name, in fact:

> dt[x]
Error in eval(expr, envir, enclos) : object 'x' not found
> dt[j = x]
[1]  TRUE  TRUE FALSE  TRUE

then, the i parameter takes either numerical or logical expression (like x itself should be), however it seems it (data.table) can't see x as logical without this:

> dt[i = x]
Error in eval(expr, envir, enclos) : object 'x' not found
> dt[i = as.logical(x)]
      x y
1: TRUE 1
2: TRUE 2
3: TRUE 4
Michele
  • 8,563
  • 6
  • 45
  • 72
  • 1
    Not sure this is a problem, `x` is not defined in the global environment but `dt[x == T]` works. – djhurio Apr 24 '13 at 11:56
  • You're right, however this error `Error in eval(expr, envir, enclos) : object 'x' not found` indicates that. So, you probably highlighted a possible bug – Michele Apr 24 '13 at 11:58
  • @djhurio In both `i` and `j` part of the documentation of `[.data.table` it's said the `expression is evaluated within the frame of the data.table (i.e. it sees column names as if they are variables)`. However, in the `i` parameter it seems that an explicit expression like `==` or `as.logical` is needed. – Michele Apr 24 '13 at 12:11
2

This should also work and is arguably more natural:

setkey(dt, x)
dt[J(TRUE)]
dt[J(FALSE)]
Rico
  • 1,998
  • 3
  • 24
  • 46
  • 1
    It's worth noting that setting a key and joining has a significantly different asymptotic complexity than does filtering on a column. The former requires sorting the data first, whereas the latter can be done in a linear pass. – Andreas Mar 22 '18 at 02:32