5

I am trying to figure out how I can perform logical operators when I use indexing in data.table package in R?

Following is the example. I make a datatable named as dt. and then make the var2 as the key in my datatable:

> dt = data.table(var1 = rep(LETTERS[1:5],2), var2 = seq(1,20, 2), var3 = ceiling(rnorm(10, 3, 2)))
> dt
    var1 var2 var3
 1:    A    1    5
 2:    B    3    3
 3:    C    5    0
 4:    D    7    6
 5:    E    9    3
 6:    A   11    4
 7:    B   13    2
 8:    C   15    1
 9:    D   17    3
10:    E   19    7

> setkey(dt, var2)

So now I want to identify all the values in my already defined key (var2) which are less than 10 ( <10). Doing the following tries give me errors.

> dt[ < 10]
Error: unexpected '<' in "dt[ <"
> dt[ .< 10]
Error in eval(expr, envir, enclos) : object '.' not found
> dt[ .(< 10)]

my expectation would be :

     var1 var2 var3

 1:    A   11    4
 2:    B   13    2
 3:    C   15    1
 4:    D   17    3
 5:    E   19    7

BTW, I know that just by doing dt[var2 <10] I will get the result. BUT please consider that I want to get the concept of Indexing in data.table and understand and know how to do it without calling the key(var2) in every each of my command!

Any help with explanation is highly appreciated.

Ajay Ohri
  • 3,382
  • 3
  • 30
  • 60
Daniel
  • 1,202
  • 2
  • 16
  • 25
  • 1
    `dt[var2 < 10]`? With the explanation being that you need to specify *something* to be less than 10? – Gregor Thomas Jul 03 '17 at 03:07
  • 1
    @Gregor, I know this and it is very obvious! But the concept of indexing is that I DO NOT need to call the column name (key) every time. so , how I can do it without calling `var2` in my command? – Daniel Jul 03 '17 at 03:10
  • 1
    From `?setkey` or `?key` it looks like you can call `key()` on a table to get the key, so `dt[dt[[key(dt)]] < 10]` works for that. You also might want to edit the question to make it state that you want to do it programmatically without calling `var2` in your command. – Gregor Thomas Jul 03 '17 at 03:12
  • 1
    @Gregor thank you for the information. Although ur answer is giving the correct output but since the concept of `indexing` in `data.table` is to avoid writing LONG command, I am expecting there should be another way much easier to address it. – Daniel Jul 03 '17 at 03:38
  • 1
    The short way is `dt[var2 < 10]`, but you say you don't want the short way. You have several complicating factors: you want to reference the `key` column programmatically and you assume there is a single `key` column. I wish you luck in finding a shorter way, but it seems foolish to care so much about a 9-keystroke difference--less if the name of your key column is longer. – Gregor Thomas Jul 03 '17 at 04:32
  • 2
    I'm afraid `dt[J()]` currently doesn't work for *non-equi* joins, so you can either specify the range you want to join with `dt[J(0:9), nomatch = 0L]` or do something like `dt[get(key(dt)) < 10]` or `dt[eval(as.name(key(dt))) < 10]` or what Gregor suggested. – David Arenburg Jul 03 '17 at 05:54

2 Answers2

3

From ?setkey, key(dt) get the key columns in a character vector. Assuming your table has a single key column, then you can get what you want with:

dt[dt[[key(dt)]] < 10]

Thanks to David Arenburg, you can also use get():

dt[get(key(dt)) < 10]

This is a little bit shorter and probably the way to go.

The other way I can think to do it is much worse:

dt[eval(parse(text = paste(key(dt), "< 10")))]
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
1

from documentation https://www.rdocumentation.org/packages/data.table/versions/1.10.4/topics/setkey

Here is a key to the solution , if possible

> library(data.table)
data.table 1.10.4
  The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
  Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
  Release notes, videos and slides: http://r-datatable.com
> data(mtcars)
> head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

> mtcars=data.table(mtcars)
> setkey(mtcars,mpg)
> key(mtcars)
[1] "mpg"


> mtcars[mpg<15,,]
    mpg cyl disp  hp drat    wt  qsec vs am gear carb
1: 10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
2: 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
3: 13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
4: 14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
5: 14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
> mtcars["mpg"<15,,]
Empty data.table (0 rows) of 11 cols: mpg,cyl,disp,hp,drat,wt...

The problem lies that key(DT) is giving "var2" while the subset in a datatable demands var2 (without the quotes) - we get this using get

So now using Remove quotes from a character vector in R

This is the simplest way

#get(key(mtcars))

    > mtcars[get(key(mtcars))<15]
    mpg cyl disp  hp drat    wt  qsec vs am gear carb
1: 10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
2: 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
3: 13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
4: 14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
5: 14.7   8  440 230 3.23 5.345 17.42  0  0    3    4

For your datatable it will be

DT[get(key(DT))<10]

which is the same as @DavidArenburg 's simple and elegant answer

Ajay Ohri
  • 3,382
  • 3
  • 30
  • 60