10

I need to calculate the mean of each off-diagonal element in an n × n matrix. The lower and upper triangles are redundant. Here's the code I'm currently using:

A <- replicate(500, rnorm(500))
sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)]))

Which seems to work but does not scale well with larger matrices. The ones I have aren't huge, around 2-5000^2, but even with 1000^2 it's taking longer than I'd like:

A <- replicate(1000, rnorm(1000)) 
system.time(sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)])))
>   user  system elapsed 
> 26.662   4.846  31.494  

Is there a smarter way of doing this?

edit To clarify, I'd like the mean of each diagonal independently, e.g. for:

 1 2 3 4
 1 2 3 4
 1 2 3 4
 1 2 3 4

I would like:

 mean(c(1,2,3))
 mean(c(1,2))
 mean(1)
blmoore
  • 1,487
  • 16
  • 31

2 Answers2

14

You can get significantly faster just by extracting the diagonals directly using linear addressing: superdiag here extracts the ith superdiagonal from A (i=1 is the principal diagonal)

superdiag <- function(A,i) {
  n<-nrow(A); 
  len<-n-i+1;
  r <- 1:len; 
  c <- i:n; 
  indices<-(c-1)*n+r; 
  A[indices]
}

superdiagmeans <- function(A) {
  sapply(2:nrow(A), function(i){mean(superdiag(A,i))})
}

Running this on a 1K square matrix gives a ~800x speedup:

> A <- replicate(1000, rnorm(1000))

> system.time(sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)])))
   user  system elapsed 
 26.464   3.345  29.793 

> system.time(superdiagmeans(A))
   user  system elapsed 
  0.033   0.006   0.039 

This gives you results in the same order as the original.

Jonathan Dursi
  • 50,107
  • 9
  • 127
  • 158
  • 1
    Nice use of indices. I vote for this one as the accepted answer, as it illustrates how powerful indices can be. – Joris Meys Dec 17 '12 at 14:05
  • 1
    Thank you, but yours is much clearer, @JorisMeys ; this approach would be worth the extra complication only if it's something you have to do a _lot_ and every tenth of a second ads up. – Jonathan Dursi Dec 17 '12 at 14:25
  • It's very smart - I had to work through the indices generation to understand what was going on. Thanks for the answer – blmoore Dec 17 '12 at 14:31
  • 1
    I'd say both answers are both "cool" and demonstrative of one feature or another of `R` . Cheers to both of you. – Carl Witthoft Dec 17 '12 at 16:02
10

You can use the following function :

diagmean <- function(x){
  id <- row(x) - col(x)
  sol <- tapply(x,id,mean)
  sol[names(sol)!='0']
}

If we check this on your matrix, the speed gain is substantial:

> system.time(diagmean(A))
   user  system elapsed 
   2.58    0.00    2.58 

> system.time(sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)])))
   user  system elapsed 
  38.93    4.01   42.98 

Note that this function calculates both upper and lower triangles. You can calculate eg only the lower triangular using:

diagmean <- function(A){
  id <- row(A) - col(A)
  id[id>=0] <- NA
  tapply(A,id,mean)
}

This results in another speed gain. Note that the solution will be reversed compared to yours :

> A <- matrix(rep(c(1,2,3,4),4),ncol=4)

> sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)]))
[1] 2.0 1.5 1.0

> diagmean(A)
 -3  -2  -1 
1.0 1.5 2.0 
Joris Meys
  • 106,551
  • 31
  • 221
  • 263