0

I am trying to learn data.table. Stuck at an early stage :) I know how to do it in R base, but not in data.table:

library(data.table)
x<-data.table(a=1:5,b=2:6,c=3:7)
x
myvars = names(x)[2:3]

First, I just want to see my data frame by referring to 'myvars' vector. It's very important for me because frequently I work with a lot of variables in 'myvars':

x[, myvars]   # I understand why it's not working
x[, .(b,c)]   # This is working, I understand why
x[, .(myvars)] # This is not working - why? How can I make it work?

I don't have the luxury to refer to all variable names all the time - I need to use them in a vector. Further, I want to run "table" (or any function) on each of myvars: Like this (using base):

X<-data.frame(a=1:5,b=2:6,c=3:7)
lapply(X[myvars],table)

How can I do it in data.table?

Thanks a lot!

user2323534
  • 585
  • 1
  • 6
  • 18
  • 4
    `x[, myvars, with = FALSE]` and `x[, lapply(.SD, table), .SDcols = myvars]` although `table` is not a good example here (since generally it does not return vectors of the same size). Please study the [new vignettes](https://github.com/Rdatatable/data.table/issues/944). – Roland Mar 11 '15 at 15:54
  • So, do I understand that I cannot use table with data.table? Here is an example: x<-data.table(a=c(1,1,3,4,4),b=c(2,2,2,3,3),c=c(3,4,4,5,5)); myvars = names(x)[2:3]; x[, lapply(.SD, table), .SDcols = myvars] – user2323534 Mar 11 '15 at 16:03
  • 1
    use what @Roland gave and `lapply` as you used it for the data.frame: `lapply(x[,myvars,with=FALSE],table)` – DaveTurek Mar 11 '15 at 16:09
  • Thanks. I just thought there might be a short/nifty way of doing it in data.table. – user2323534 Mar 11 '15 at 16:11
  • Exactly, data.table objects inherit from class data.frame and thus can also be treated as lists. And in this case you don't want a data.table as return value. – Roland Mar 11 '15 at 16:11
  • @Roland: I recently had difficulty using `table` and `tapply` within the data.table `j`-argument and gave up and went back to my data.frame methods. It seemed as though the `table` function actually took longer when called inside `data.table[`. So I was wondering if your first and second comments taken together are applicable to my situation? (I've been meaning to write up a question with "real" example and can still do so if this seems unacceptably vague.) – IRTFM Mar 11 '15 at 16:30
  • 4
    @BondedDust I have no idea why you'd want to use `tapply` inside data.table. Since I don't understand what you are trying to achieve I believe asking a question might be best. – Roland Mar 11 '15 at 19:25
  • I was building two matrices, one using `table` and the other using `tapply` and then dividing them element-wise. – IRTFM Mar 11 '15 at 19:30
  • It could be that I was barking up the wrong tree. This question may hold the key to my further DT progress: http://stackoverflow.com/questions/24295482/matrix-operations-and-component-wise-addition-using-data-table – IRTFM Mar 11 '15 at 19:53
  • Well, you are supposed to be able to do a lot of stuff (http://blog.datacamp.com/data-table-r-tutorial/) with the `j` part of the `data.table`, including printing a table: `invisible(x[, print(lapply(.SD, table)), .SDcols = myvars])`. I'm not sure I'd want to do it this way, but... – DaveTurek Mar 12 '15 at 12:14

0 Answers0