Subsetting a data.table with a variable (when varname identical to colname)

Question

How can I subset a data.table by using a variable, when the variable name is identical to an existing column name in the data.table? It works with get("varname",pos = 1), but is there are more robust/flexible solution?

library(data.table)

my_data_frame <- data.frame(
"V1"=c("A","B","C","A"),
"V2"=c(1, 2, 3, 4),
stringsAsFactors = FALSE        
)

V1 <- "A"

my_data_table <- as.data.table(my_data_frame)

# Can I improve this a bit? I want rows where V1 == "A", but use V1 in the statement 
my_data_table[ my_data_table$V1 == get("V1", pos = 1), ]

Renaming V1 is not an option.

UPDATE: I do not consider this a 100% duplicate. The accepted answer for this question is not acceptable for my question, since it uses explicit get which I do not want to use, as stated in the comments.

I don't get it, what's wrong with `my_data_table[,"V1"=="A"]` or `my_data_table[,"V1"==V1]`? — user2974951, Sep 24 '18 at 08:49
@user2974951 Thanks, but your solutions do not return the desired result, since you do not use data.table syntax. The desired result has two rows. — nilsole, Sep 24 '18 at 08:53
I do not want to state the level of environment (pos = 1) explicitly as done in the example. Instead, I would like to make R look for an outer object called "V1" rather than using V1 as the column name. The above code works, but will not necessarily work when I copy the code into a different scope. — nilsole, Sep 24 '18 at 08:56
Perhaps a bit unorthodox to do row subsetting in `j`, but then we can use the ['dot dot notation'](https://github.com/Rdatatable/data.table/blob/master/NEWS.md#changes-in-v1102--on-cran-31-jan-2017): `d[ , d[V1 == ..V1]]` — Henrik, Sep 24 '18 at 09:22
Another option is to specify the environment: `my_data_table[V1 == get("V1", envir = .GlobalEnv)]` — Jaap, Sep 24 '18 at 09:24
@Henrik Tried your example, but it gives me `Error in eval (expr, envir, enclos): object '..V1' not found` — nilsole, Sep 24 '18 at 09:27
It works here (I just used the shorter "d" as name of the data set). Do you have `data.table` version >= v1.10.2? — Henrik, Sep 24 '18 at 09:34
another alternative similar to Henrik: `d[d[, .I[V1 == ..V1]]]` — chinsoon12, Sep 24 '18 at 10:14
Possible duplicate of [data.table := assignments when variable has same name as a column](https://stackoverflow.com/questions/32738499/data-table-assignments-when-variable-has-same-name-as-a-column) — h3rm4n, Sep 24 '18 at 12:28

score 3 · Answer 1 · edited Sep 24 '18 at 13:34

3

Here is a solution using library(tidyverse):

library(data.table)
library(tidyverse)
my_data_frame <- data.frame(
  "V1"=c("A","B","C","A"),
  "V2"=c(1, 2, 3, 4),
  stringsAsFactors = FALSE        
)

V1 = "A"
my_data_table <- as.data.table(my_data_frame)
df = my_data_table %>% filter(V1 == !!get("V1")) #you do not have to specify pos = 1

If you want to make R use the object named "V1" you can do this

V1 = "A"
list_test = split(my_data_table, as.factor(my_data_table$V1)) #create a list for each factor level of the column V1.
df = list_test[[V1]] #extract the desired dataframe from the list using the object "V1"

Is it what you want?

edited Sep 24 '18 at 13:34

Axeman

32,068
8
81
94

answered Sep 24 '18 at 09:02

Paul

2,850
1
12
37

1

Everything is correct here, but perhaps you could give the tidyverse solution for the name collision of `V1`, as that seems to be the gist of the problem here. – Axeman Sep 24 '18 at 12:32
Thanks for your suggestion. I am not sure to fully understand how the tidyverse can help to deal with the name collision. The `dplyr::filter` only uses the dataframe column `V1`. It avoids the name collision by removing the need for the object `V1`. But I though this object was needed so I wrote the second solution with `base::split` which allows to use the object `V1` and the dataframe column `V1` without any doubt. Plus, the list_test object is very nice to work with lapply() and custom functions. – Paul Sep 24 '18 at 12:56
1

In other words, can you write the `filter` statement such that it is guaranteed to use the global variable instead of the column name? (Like the data.table `..` notation.) For the general case? I know one can write `.data$` for the opposite, but I'm not sure how to force the scope to be outside of the data.frame. – Axeman Sep 24 '18 at 13:00
1

I found this : `V1 = "A" my_data_table <- as.data.table(my_data_frame) df = my_data_table %>% filter(V1 == !!get("V1"))` inspired by this post https://stackoverflow.com/questions/34219912/how-to-use-a-variable-in-dplyrfilter – Paul Sep 24 '18 at 13:20

score 3 · Accepted Answer · answered Sep 24 '18 at 09:04

3

If you don't mind doing it in 2 steps, you can just subset out of the scope of your data.table (though it's usually not what you want to do when working with data.table...):

wh_v1 <- my_data_table[, V1]==V1
my_data_table[wh_v1]
#   V1 V2
#1:  A  1
#2:  A  4

answered Sep 24 '18 at 09:04

Cath

23,906
5
52
86

1

Your answer is my favourite along with that of @Henrik . – nilsole Sep 24 '18 at 09:41

score 1 · Answer 3 · answered Sep 24 '18 at 15:58

For equality conditions, you can use a join:

mDT = data.table(V1)
my_data_table[mDT, on=.(V1), nomatch=0]
#    V1 V2
# 1:  A  1
# 2:  A  4

Implicitly, the join condition in x[i, on=.(V1)] is

V1 == V1

where the LHS comes from x and the RHS from i. It is like a lookup of each row of i in x. The nomatch=0 means that any value found in i but not x is dropped from the output... for example

mDT2 = data.table(V1 = c("A", "D"))
my_data_table[mDT2, on=.(V1)]
#    V1 V2
# 1:  A  1
# 2:  A  4
# 3:  D NA

my_data_table[mDT2, on=.(V1), nomatch=0]
#    V1 V2
# 1:  A  1
# 2:  A  4

Subsetting a data.table with a variable (when varname identical to colname)

3 Answers3