155

How can we select multiple columns using a vector of their numeric indices (position) in data.table?

This is how we would do with a data.frame:

df <- data.frame(a = 1, b = 2, c = 3)
df[ , 2:3]
#   b c
# 1 2 3
Henrik
  • 65,555
  • 14
  • 143
  • 159
jamborta
  • 5,130
  • 6
  • 35
  • 55

5 Answers5

196

For versions of data.table >= 1.9.8, the following all just work:

library(data.table)
dt <- data.table(a = 1, b = 2, c = 3)

# select single column by index
dt[, 2]
#    b
# 1: 2

# select multiple columns by index
dt[, 2:3]
#    b c
# 1: 2 3

# select single column by name
dt[, "a"]
#    a
# 1: 1

# select multiple columns by name
dt[, c("a", "b")]
#    a b
# 1: 1 2

For versions of data.table < 1.9.8 (for which numerical column selection required the use of with = FALSE), see this previous version of this answer. See also NEWS on v1.9.8, POTENTIALLY BREAKING CHANGES, point 3.

Henrik
  • 65,555
  • 14
  • 143
  • 159
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • 1
    No problem. Compare also `dt[,"a"]` and `dt[,"a", with=FALSE]` to see what a helpful option it really is. – Josh O'Brien Nov 14 '12 at 17:41
  • 3
    any way to do this without with? for example `DT[,list(b:c)`, as I found it convenient to transform the columns directly in the data table, e.g I can do `DT[,list(1/b,2*c)]`, but this does not work with with. – jamborta Nov 14 '12 at 18:00
  • You could use .SDcols=2:3 or the character vector of names. Not sure what you mean regarding example not working with with – mnel Nov 14 '12 at 21:35
  • If you are wanting to set by reference DT[, c("b","c") := list(1/b, 2 * c)] should work. See the help for := – mnel Nov 14 '12 at 21:39
  • 2
    A change to the package will make `with=FALSE` unnecessary in this case: https://github.com/Rdatatable/data.table/issues/1188#issuecomment-127824969 – Frank Aug 24 '15 at 14:27
  • 1
    @Frank -- That's great news! Thanks for bringing it to my attention. Once that change makes its way into the version of **data.table** distributed on CRAN, I'll edit this answer to announce the change up top. (And please -- you or anyone else who reads this -- feel free to ping me with a reminder as soon as that happens.) – Josh O'Brien Aug 24 '15 at 23:28
  • @Frank Is there any version available making unnecessary with=FALSE for the cases explained in your link? Thanks – Ferran E Jan 15 '16 at 10:49
  • @FerranE Not yet. At the top of the linked page, it says "Open" in green, which will switch to "Closed" when it's done. There'll also be a new message towards the bottom of the page. I think it might be a while until this is implemented. – Frank Jan 15 '16 at 13:08
  • Is there a data.frame-compatible way? – Valentas Oct 18 '16 at 11:46
  • 2
    @Valentas Funny you should ask. There is not a `data.frame` compatible way to use `with=FALSE`. However, as of about 3 weeks ago, the development version of **data.table** has been modified to calls like `dt[, 2]`, `dt[, 2:3]`, `dt[, "b"]`, and `dt[, c("b", "c")]` behave the same as they do in the with `data.frame`s **without** having to explicitly set `with=FALSE`. It's terrific! [See here](https://github.com/Rdatatable/data.table/commit/f78d7901dc5063a5157d37db4831e418ad021fe6) for the particular commit, including the NEWS entry describing the change. – Josh O'Brien Oct 18 '16 at 15:41
  • @Valentas [Here](https://github.com/Rdatatable/data.table/wiki/Installation), in case you need them, are the instructions for how to install the development version. (Do be aware, of course, that the development version, right up there on the bleeding edge, is more likely to have bugs resulting from recent possibly breaking changes to the code base.) – Josh O'Brien Oct 18 '16 at 15:44
  • @JoshO'Brien why is dt[, c("b", "c")] not working for me? it gives as output "b" "c". – Herman Toothrot Oct 21 '16 at 13:19
  • @user4050 That syntax only works with the current development version of **data.table**. For all earlier versions, one of which you are almost certainly using, you needed to `dt[, c("b", "c"), with=FALSE]` (as discussed in the main body of the question). – Josh O'Brien Oct 21 '16 at 15:16
  • @JoshO'Brien is there anyway to mix indices and column names? – Herman Toothrot May 29 '20 at 09:30
45

It's a bit verbose, but i've gotten used to using the hidden .SD variable.

b<-data.table(a=1,b=2,c=3,d=4)
b[,.SD,.SDcols=c(1:2)]

It's a bit of a hassle, but you don't lose out on other data.table features (I don't think), so you should still be able to use other important functions like join tables etc.

Artem Klevtsov
  • 9,193
  • 6
  • 52
  • 57
Tom
  • 1,221
  • 1
  • 12
  • 13
39

If you want to use column names to select the columns, simply use .(), which is an alias for list():

library(data.table)
dt <- data.table(a = 1:2, b = 2:3, c = 3:4)
dt[ , .(b, c)] # select the columns b and c
# Result:
#    b c
# 1: 2 3
# 2: 3 4
Henrik
  • 65,555
  • 14
  • 143
  • 159
R Yoda
  • 8,358
  • 2
  • 50
  • 87
22

From v1.10.2 onwards, you can also use ..

dt <- data.table(a=1:2, b=2:3, c=3:4)

keep_cols = c("a", "c")

dt[, ..keep_cols]
rafa.pereira
  • 13,251
  • 6
  • 71
  • 109
3

@Tom, thank you very much for pointing out this solution. It works great for me.

I was looking for a way to just exclude one column from printing and from the example above. To exclude the second column you can do something like this

library(data.table)
dt <- data.table(a=1:2, b=2:3, c=3:4)
dt[,.SD,.SDcols=-2]
dt[,.SD,.SDcols=c(1,3)]
Bhoom Suktitipat
  • 2,147
  • 2
  • 17
  • 11