0

I have a data-table with pairs of columns:

set.seed(1) 
dt <- data.table(a1 = round(runif(3),1), a2 = round(runif(3),1), a3 =round(runif(3),1),
                 b1 = round(runif(3),1), b2 = round(runif(3),1), b3 =round(runif(3),1))

it looks like this:

 a1  a2  a3  b1  b2  b3
0.3 0.9 0.9 0.1 0.7 0.5
0.4 0.2 0.7 0.2 0.4 0.7
0.6 0.9 0.6 0.2 0.8 1.0

Now I would like to define a subset of columns:

n<-2
cols <- paste0(rep(letters[1:2], n),rep(1:n,each=2))
"a1" "b1" "a2" "b2"

and add a new column for each pair with the difference of the pair:

dt[,.(desired_ab1=c(0.2,0.2,0.4),desired_ab2=c(0.2,-0.2,0.1))]

desired_ab1 desired_ab2
        0.2         0.2
        0.2        -0.2
        0.4         0.1

my goal looks similar to a previous post: Sum pairs of columns by group, so I tried the following:

myNames <- paste0("ab_", seq(n))
dt[, (myNames):=lapply(1:(ncol(.SD)/2), 
                       function(x) (.SD[2*(x-1)+1]-.SD[2*(x-1)+2])), .SDcols=cols][]

I don't see what happens. Can someone give me a hint where this fails?

HannesZ
  • 599
  • 2
  • 5
  • 17
  • Nice! Thank's @mt1022 that is what I was looking for. – HannesZ Nov 23 '17 at 14:06
  • @mt1022: how about promoting your comment to an answer? More helpfull for me than for you obviously, but maybe also for others? Thanks anyway, also for your explanations! – HannesZ Nov 23 '17 at 14:34
  • 1
    Please, append `[]` to get the result printed, i.e., `dt[, (myNames):=lapply(1:(ncol(.SD)/2), function(x) (.SD[2*(x-1)+1]-.SD[2*(x-1)+2])), .SDcols=cols][]` – Uwe Nov 23 '17 at 14:42

2 Answers2

2

.SD is also a data.table. You have to use [[ to subset a column (or use [, j, with = FALSE]). So this: function(x) (.SD[[2*(x-1)+1]]-.SD[[2*(x-1)+2]]) will work for you case. And when you use single [, it is actually subsetting a row not a column from .SD (like what happens for a normal data.table).

mt1022
  • 16,834
  • 5
  • 48
  • 71
1

The arithmetic to access the columns looks rather complicated and is fragile if the columns are ordered differently.

By reshaping the data from wide to long form the calculation can be simplified. The data in long form is the result of melting two measure columns simultaneously

cols <- c("a", "b")
melt(dt, measure.vars = patterns(cols), value.name = cols)]
   variable   a   b
1:        1 0.3 0.1
2:        1 0.4 0.2
3:        1 0.6 0.2
4:        2 0.9 0.7
5:        2 0.2 0.4
6:        2 0.9 0.8
7:        3 0.9 0.5
8:        3 0.7 0.7
9:        3 0.6 1.0

Now, only the difference between columns a and b needs to be calculated. Finally, the result can be reshaped to wide form again (after filtering for the desired pairs.

If all parts are put together this becomes

# reshape from wide to long format with two measure variables
melt(dt, measure.vars = patterns(cols), value.name = cols)[
  # calculate differences
  , ab := a - b][
    # select pairs of interest
    variable %in% 1:2, 
    # reshape from long to wide format
    dcast(.SD, rowid(variable) ~ paste0("ab", variable), value.var = "ab")][
      # drop row id
      , -"variable"]
   ab1  ab2
1: 0.2  0.2
2: 0.2 -0.2
3: 0.4  0.1
Uwe
  • 41,420
  • 11
  • 90
  • 134