Sum pairs of columns by group

Question

I wish to sum pairs of columns by group. In the example below I wish to sum pairs (v1 and v2), (v3 and v4), and (v5 and v6), each by r1, r2 and r3.

I can do this using the sapply statement below and I get the correct answer. However, the required code is complex. Could someone show me how to do the same operation perhaps in package data.table or with rollapply and/or other options? I have not yet explored those options.

Sorry if this is a duplicate.

my.data <- read.table(text= "
   r1  r2  r3    t1    t2    t3    v1   v2   v3   v4   v5   v6
    1   0   0    10    20    30     1    0    0    0    0    0
    1   0   0    10    20    30     1    1    0    0    0    0
    1   0   0    10    20    30     1    0    1    0    0    0
    1   0   0    10    20    30     1    0    1    1    0    0
    1   0   0    10    20    30     0    0    0    0    0    0

    0   1   0    10    20    30     0    1    1    1    1    1
    0   1   0    10    20    30     0    0    1    1    1    1
    0   1   0    10    20    30     0    0    0    1    1    1
    0   1   0    10    20    30     0    0    0    0    1    1
    0   1   0    10    20    30     0    0    0    0    0    1

    0   0   1    10    20    30     1    1    1    1    1    1
    0   0   1    10    20    30     1    0    1    1    1    1
    0   0   1    10    20    30     1    0    0    1    1    1
    0   0   1    10    20    30     1    0    0    0    1    1
    0   0   1    10    20    30     1    0    0    0    0    1
", header=TRUE, na.strings=NA)

my.data$my.group <- which(my.data[,1:3]==1, arr.ind=TRUE)[,2]
my.data

my.sums <- t(sapply(split(my.data[,7:(ncol(my.data)-1)], my.data$my.group), function(i) sapply(seq(2, ncol(i), 2), function(j) sum(i[,c((j-1),j)], na.rm=TRUE))))
my.sums

#   [,1] [,2] [,3]
# 1    5    3    0
# 2    1    5    9
# 3    6    5    9

Thanks. I fixed it. I removed one column just before posting and forgot to modify the grouping code. — Mark Miller, Oct 28 '13 at 20:04
when you say "by `r1`, r2`, etc.." is that respectively (relative to the column pairs?) Also, are there more columns than in this example, or just those shown here? — Ricardo Saporta, Oct 28 '13 at 20:10
The real data set has 110 columns or 55 pairs to sum and the grouping variables, r, goes from 1 to 5 instead of 1 to 3 in this example. I am not sure what you mean by your first question. — Mark Miller, Oct 28 '13 at 20:12

eddi · Accepted Answer · 2013-10-28T20:35:34.090

4

Here's a pretty general expression that you can probably simplify if you want it to match your specific data dimensions/column names/etc:

library(data.table)
dt = data.table(my.data)

dt[, lapply(1:(ncol(.SD)/2), function(x) sum(.SD[[2*x-1]], .SD[[2*x]])),
     by = eval(grep('^r', names(dt), value = TRUE)),
     .SDcols = grep('^v', names(dt), value = TRUE)]
#   r1 r2 r3 V1 V2 V3
#1:  1  0  0  5  3  0
#2:  0  1  0  1  5  9
#3:  0  0  1  6  5  9

edited Oct 28 '13 at 20:35

answered Oct 28 '13 at 20:20

eddi

49,088
6
104
155

Nice! I was expecting to see a monster 20-line answer. – Frank Oct 28 '13 at 20:26

score 1 · Answer 2 · answered Oct 28 '13 at 22:06

Also, using aggregate and mapply:

DF <- my.data

#function to sum 2 columns
fun <- function(col1, col2) 
{
 rowSums(aggregate(DF[c(col1, col2)], by = list(DF$r1, DF$r2, DF$r3), sum)[c(4, 5)])
}

#all pairs of columns, to be summed, in a matrix
#(7 is the column of v1)
args_mat <- matrix(7:ncol(DF), ncol = 2, byrow = T)

#apply `fun` to all pairs
mapply(fun, args_mat[,1], args_mat[,2])
#     [,1] [,2] [,3]
#[1,]    5    3    0
#[2,]    1    5    9
#[3,]    6    5    9

Sum pairs of columns by group

2 Answers2

Linked