Compute matrix of sums

Question

Suppose I have a data.frame with several columns of categorical data, and one column of quantitative data. Here's an example:

my_data <- structure(list(A = c("f", "f", "f", "f", "t", "t", "t", "t"), 
                          B = c("t", "t", "t", "t", "f", "f", "f", "f"), 
                          C = c("f","f", "t", "t", "f", "f", "t", "t"), 
                          D = c("f", "t", "f", "t", "f", "t", "f", "t")),
                     .Names = c("A", "B", "C", "D"), 
                     row.names = 1:8, class = "data.frame")
my_data$quantity <- 1:8

Now my_data looks like this:

  A B C D quantity
1 f t f f        1
2 f t f t        2
3 f t t f        3
4 f t t t        4
5 t f f f        5
6 t f f t        6
7 t f t f        7
8 t f t t        8

What's the most elegant way to get a cross tab / sum of quantity where both values =='t'? That is, I'm looking for an output like this:

   A   B   C   D  
A "?" "?" "?" "?"
B "?" "?" "?" "?"
C "?" "?" "?" "?"
D "?" "?" "?" "?"

..where the intersection of x/y is the sum of quantity where x=='t' and y=='t'. (I only care about half this table, really, since half is duplicated)

So for example the value of A/C should be:

good_rows <- with(my_data, A=='t' & C=='t')
sum(my_data$quantity[good_rows])

15

*Edit: What I already had was:

nodes <- names(my_data)[-ncol(my_data)]
sapply(nodes, function(rw) {
  sapply(nodes, function(cl) {
    good_rows <- which(my_data[, rw]=='t' & my_data[, cl]=='t')
    sum(my_data[good_rows, 'quantity'])
  })
})

Which gives the desired result:

   A  B  C  D
A 26  0 15 14
B  0 10  7  6
C 15  7 22 12
D 14  6 12 20

I like this solution because, being very 'literal', it's fairly readable: two apply funcs (aka loops) to go through rows * columns, compute each cell, and produce the matrix. Also plenty fast enough on my actual data (tiny: 192 rows x 10 columns). I didn't like it because it seems like a lot of lines. Thank you for the answers so far! I will review and absorb.

Since you're asking for an "elegant" way rather than "any" way, would you mind posting what you have now? That way we don't end up rewriting code you've already written. — shadowtalker, Sep 30 '14 at 23:59

score 6 · Accepted Answer · answered Oct 01 '14 at 00:42

6

Try using matrix multiplication

temp <- (my_data[1:4]=="t")*my_data$quantity

t(temp) %*% (my_data[1:4]=="t") 

#   A  B  C  D
#A 26  0 15 14
#B  0 10  7  6
#C 15  7 22 12
#D 14  6 12 20

(Although this might be a fluke)

answered Oct 01 '14 at 00:42

user20650

24,654
5
56
91

Beauty! Thanks. Here's a way to get it even a tiny bit leaner on the page: `tf <- my_data[, 1:4]=='t'; t(tf*my_data$quantity) %*% tf` – arvi1000 Oct 01 '14 at 02:19

score 3 · Answer 2 · answered Oct 01 '14 at 00:23

For each row name, you could build a vector dat that's just the rows with that value equal to t. Then you could multiply the true/false values in this data subset by that row's quantity value (so it's 0 when false and the quantity value when true), finally taking the column sum.

sapply(c("A", "B", "C", "D"), function(x) {
  dat <- my_data[my_data[,x] == "t",]
  colSums((dat[,-5] == "t") * dat[,5])
})
#    A  B  C  D
# A 26  0 15 14
# B  0 10  7  6
# C 15  7 22 12
# D 14  6 12 20

Thanks! Somewhat similar approach as I had in mind (with step 1 being row selection), but yours avoids the second sapply. Upvoted. — arvi1000, Oct 01 '14 at 02:26

Compute matrix of sums

2 Answers2