3

I have data in the form of n*n matrix for which I want to do some computations (e.g. sum) on whose elements placed between diagonals (excluding diagonals).

For example for this matrix:

     [,1] [,2] [,3] [,4] [,5]
[1,]    2    0    1    4    3
[2,]    5    3    6    0    4
[3,]    3    5    2    3    1
[4,]    2    1    5    3    2
[5,]    1    4    3    4    1

The result for sum (between diagonal elements) would be:

# left slice 5+3+2+5 = 15
# bottom slice 4+3+4+5 = 16
# right slice 4+1+2+3 = 10
# top slice 0+1+4+6 = 11

# dput(m)
m <- structure(c(2, 5, 3, 2, 1, 0, 3, 5, 1, 4, 1, 6, 2, 5, 3, 4, 0, 
3, 3, 4, 3, 4, 1, 2, 1), .Dim = c(5L, 5L))

How to accomplish that efficiently?

989
  • 12,579
  • 5
  • 31
  • 53

2 Answers2

7

Here's how you can get the "top slice":

sum(m[lower.tri(m)[nrow(m):1,] & upper.tri(m)])
#[1] 11

to visualize it:

lower.tri(m)[nrow(m):1,] & upper.tri(m)
#      [,1]  [,2]  [,3]  [,4]  [,5]
#[1,] FALSE  TRUE  TRUE  TRUE FALSE
#[2,] FALSE FALSE  TRUE FALSE FALSE
#[3,] FALSE FALSE FALSE FALSE FALSE
#[4,] FALSE FALSE FALSE FALSE FALSE
#[5,] FALSE FALSE FALSE FALSE FALSE

Here's how you can compute all 4 of the slices:

up <- upper.tri(m)
lo <- lower.tri(m)
n <- nrow(m)

# top
sum(m[lo[n:1,] & up])
# left
sum(m[lo[n:1,] & lo])
# right
sum(m[up[n:1,] & up])
# bottom
sum(m[up[n:1,] & lo])
talat
  • 68,970
  • 21
  • 126
  • 157
1
sum(sapply(1:dim(m)[[2L]], function(i) sum(m[c(-i,-(dim(m)[[1L]]-i+1)),i])))

This goes column by column and for each column takes out the the diagonal elements and sums the rest. These partial results are then summed up.

I believe this would be fast because we go column by column and matrices in R are stored column by column (i.e. it will be CPU cache friendly). We also do not have to produce large vector of indices, only vector of two indices (those taken out) for each column.

EDIT: I read the question again more carefully. The code can be updated to produce list four values for each element in sapply: for each of the regions. The idea stays the same, for large matrix, it will be fast if you go column by column, not jumping back and forth between columns.

Steves
  • 2,798
  • 21
  • 21
  • well, yes, but column wise, that means data in your first column are one chunk of memory followed by data in the second column etc. I mean although it is all continuous memory, it is faster to access it locally https://en.wikipedia.org/wiki/Locality_of_reference – Steves Jun 24 '16 at 09:12
  • If total sum of all slices are desired, then why simply not: `diag(m)=0; diag(m[5:1,])=0; sum(m);` which equals 52. But I want to do the computations per slice separately. – 989 Jun 24 '16 at 09:12
  • if zeroing diagonals is OK, then this will be faster probably – Steves Jun 24 '16 at 09:14
  • even if not OK, we could take a copy of the matrix, again simply. – 989 Jun 24 '16 at 09:16
  • but then we are copying the potentially large matrix, I not sure this would perform better – Steves Jun 24 '16 at 09:39