31

I want to subset a data.table using a variable which has the same name as the column which leeds to some problems:

dt <- data.table(a=sample(c('a', 'b', 'c'), 20, replace=TRUE),
                 b=sample(c('a', 'b', 'c'), 20, replace=TRUE),
                 c=sample(20), key=c('a', 'b'))

evn <- environment()
a <- 'b'
dt[a == a]

#Expected Result
dt[a == 'b']

I came across this possible solution:

env <- environment()
dt[a == get('a',env)]

But it is as unhandy as:

this.a = a
dt[a == this.a]

So is there another elegant solution?

Community
  • 1
  • 1
jakob-r
  • 6,824
  • 3
  • 29
  • 47
  • 5
    We're aware of this [scoping issue](https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2110&group_id=240&atid=978). This is very much a priority and will be fixed ASAP. Thanks for reporting. For now, using a different variable name would be the way to go. – Arun Feb 09 '14 at 12:24
  • 1
    I'm confused - why would you think that `a == a` should work or is good syntax? R-forge seems to be down for me atm, so I can't see the link from @Arun and what exactly it's about, but making `a == a` work (in the way OP wants it to work) seems like a bad idea to me and I think your last solution *is* the correct one. – eddi Feb 09 '14 at 19:34
  • 1
    Separately from my above comment, since your `data.table` is keyed by `a`, you can do `dt[a]` – eddi Feb 09 '14 at 19:38
  • see http://stackoverflow.com/questions/15102068/keyed-lookup-on-data-table-without-with/15102156#15102156 – mnel Feb 10 '14 at 00:47

2 Answers2

13

For now, a temporary solution could be,

`..` <- function (..., .env = globalenv())
{
  get(deparse(substitute(...)), env = .env)
}

..(a)
## [1] "b"

dt[a==..(a)]
##    a b  c
## 1: b a 15
## 2: b a 11
## 3: b b  8
## 4: b b  4
## 5: b c  5
## 6: b c 12

Though this looks elegant, I am still waiting for a more robust solution to such scope issues.

Edited according to @mnel's suggestion,

`..` <- function (..., .env = sys.parent(2))
{
  get(deparse(substitute(...)), env = .env)
}
xb.
  • 1,617
  • 11
  • 16
  • 3
    `env=sys.parent(2)` might be safer. – mnel Feb 10 '14 at 00:50
  • @mnel A nice concern! May I ask "why" and where can I find any related documentation to read? – xb. Feb 10 '14 at 04:42
  • 2
    You may be calling `[.data.table` from something which isn't the global environment, `sys.parent(2)`, will search and find the correct calling environment. – mnel Feb 10 '14 at 05:09
  • Thanks. `sys.parent(2)` should be a more practical option! – xb. Feb 10 '14 at 05:20
10

Now it's simple (since ..() syntax introduced in data.table):

dt[eval(dt[, a %in% ..a])]

or even simpler in your particular case (since a is a 1st column):

dt[eval(.(a))] # identical to dt["b"]
George Shimanovsky
  • 1,668
  • 1
  • 17
  • 15
  • 3
    Re dt[a], good idea for the OP's case of a string column, but it does not generalize. Eg, `dt2 = data.table(x = c(1,2,2), key="x"); x = 2; dt2[x]` here x is recognized as a vector of row numbers. – Frank 2 Oct 11 '19 at 22:24
  • 1
    Thank you for your note Frank, you're right. I've replaced dt2[x] approach to dt2[eval(.(x))] to generalize. – George Shimanovsky Oct 12 '19 at 19:23