12

If I have a data.table and I'm doing some function call in the j, do I have access to the current value of the by variable?

library(data.table)
d <- data.table(x=1:10, y=c('a', 'b'))

myfun <- function(DT) {
  print (DT$y)
}


d[, myfun(.SD), by=y]

For more context, I am passing a second argument (another data.table) to myfun and want to subset that based on the current value of y. It could be done with a dummy variable, but that seems yucky...

Justin
  • 42,475
  • 9
  • 93
  • 111
  • Can you use `.GRP`? `d[, .GRP, by=y]` – Roland Feb 12 '14 at 16:09
  • 1
    @Roland Hmm, I didn't know about `.GRP` but it is just an index correct? I could, but then I'm starting to add extra subsetting. – Justin Feb 12 '14 at 16:19
  • It should be relatively easy to turn the corresponding column in your other data.table into an integer. However, maybe there is a better approach to your real problem. You'd need to show it. – Roland Feb 12 '14 at 16:26
  • 1
    @Roland To be sure. I have a few other options for my real code. But, I was mostly interested for future reference and curiosity. E.g. a progress update on long running code that prints the current `by` variable when it starts or something... – Justin Feb 12 '14 at 16:30
  • 1
    I can think of two ways: 1) Add a second argument to your function and pass `y`- `my.fun(.SD, y)`. Remember that grouping variables will be of length 1 (as of now, there's an issue somewhere yet to be fixed). 2) Use `.SDcols` and add `y` as well to it, then `.SD` will contain the `y` column as well Ex: `d[, my.fun(.SD), by=y, .SDcols=1:2]` – Arun Feb 12 '14 at 16:34
  • @Arun option 1 was what I was looking for more or less. Feel free to make it an answer. I was really hoping that the grouping variable would be passed in as an attribute on `.SD` or something like that... – Justin Feb 12 '14 at 16:36
  • @Justin, it just hasn't come up is all I'd think. Feel free to file a FR if you think that might be a nice thing to have. – Arun Feb 12 '14 at 16:40

1 Answers1

17

Use .BY - which is a list of by variables:

d <- data.table(x=1:10, y=c('a', 'b'))
d[, .BY[[1]], by = y]  # [[1]] to access the first by variable, which is y
                       # if you had by = list(x, y) you'd do .BY[[2]] to access y
#   y V1
#1: a  a
#2: b  b

Additionally the list is named, so you can also access it by name:

d[, .BY$y, by = y]
eddi
  • 49,088
  • 6
  • 104
  • 155