2

So I'm new to data.table and don't understand now I can modify by reference at the same time that I perform an operation on chosen columns using the .SD symbol? I have two examples.

Example 1

> DT <- data.table("group1:1" = 1, "group1:2" = 1, "group2:1" = 1)
> DT
   group1:1 group1:2 group2:1
1:        1        1        1

Let's say for example I simply to choose only columns which contain "group1:" in the name. I know it's pretty straightforward to just reassign the result of operation to the same object like so:

cols1 <- names(DT)[grep("group1:", names(DT))]
DT <- DT[, .SD, .SDcols = cols1]

From reading the data.table vignette on reference-semantics my understanding is that the above does not modify by reference, whereas a similar operation that would use the := would do so. Is this accurate? If that's correct Is there a better way to do this operation that does modify by reference? In trying to figure this out I got stuck on how to combine the .SD symbol and the := operator. I tried

DT[, c(cols1) := .SD, .SDcols = cols1]
DT[, c(cols1) := lapply(.SD,function(x)x), .SDcols = cols1]

neither of which gave the result I wanted.

Example 2

Say I want to perform a different operation dcast that uses .SD as input. Example data table:

> DT <- data.table(x = c(1,2,1,2), y = c("A","A","B","B"), z = 5:8)
> DT
   x y z
1: 1 A 5
2: 2 A 6
3: 1 B 7
4: 2 B 8

Again, I know I can just reassign like so:

> DT <- dcast(DT, x ~ y, value.var = "z")
> DT
   x A B
1: 1 5 7
2: 2 6 8

But don't understand why the following does not work (or whether it would be preferable in some circumstances):

> DT <- data.table(x = c(1,2,1,2), y = c("A","A","B","B"), z = 5:8)
> cols <- c("x", unique(DT$y))
> DT[, cols := dcast(.SD, x ~ y, value.var = "z")]
Jeff Groh
  • 135
  • 11

1 Answers1

2

In your example,

cols1 <- names(DT)[grep("group1:", names(DT))]
DT[, c(cols1) := .SD, .SDcols = cols1] # not this

DT[, (cols1) := .SD, .SDcols = cols1] # this will work

Below is other example to set 0 values on numeric columns .SDcols by reference.

The trick is to assign column names vector before :=.

colnames = DT[, names(.SD), .SDcols = is.numeric] # column name vector
DT[, (colnames) := lapply(.SD, nafill, fill = 0), .SDcols= is.numeric]
Matthew Son
  • 1,109
  • 8
  • 27