0

I have a question pertaining to R.

I have some sequentially numbered matrices (all of the same dimensions) and I want to search them all and produce a final matrix that contains (for each matrix element) the number of times a defined threshold was exceeded.

As an example, I could choose a threshold of 0.7 and I could have the following three matrices.

matrix1
    [,1] [,2] [,3]
[1,] 0.38 0.72 0.15
[2,] 0.58 0.37 0.09
[3,] 0.27 0.55 0.22

matrix2
    [,1] [,2] [,3]
[1,] 0.19 0.78 0.72
[2,] 0.98 0.65 0.46
[3,] 0.72 0.57 0.76

matrix3
     [,1] [,2] [,3]
[1,] 0.39 0.68 0.31
[2,] 0.40 0.05 0.92
[3,] 1.00 0.43 0.21

My desired output would then be

      [,1] [,2] [,3]
[1,]    0    2    1
[2,]    1    0    1
[3,]    2    0    1

If I do this:

test <- matrix1 >= 0.7
test[test==TRUE] = 1

then I get a matrix that has a 1 where the threshold is exceeded, and 0 where it's not. So this is a key step in what I want to do:

test=
      [,1] [,2] [,3]
[1,]    0    1    0
[2,]    0    0    0
[3,]    0    0    0

My thought is to make a loop so I perform this calculation on each matrix and add each result of "test" so I get the final matrix I desire. But I'm not sure about two things: how to use a counter in the variable name "matrix", and second if there's a more efficient way than using a loop.

So I'm thinking of something like this:

output = matrix(0,3,3)

for i in 1:3 {

test <- matrixi >= 0.7        
test[test==TRUE] = 1
output = output + test }

Of course, this doesn't work because matrixi does not translate to matrix1, matrix2, etc.

I really appreciate your help!!!

  • Is there a reason you're keeping your matrices in separate variables an not something more convenient like a list or 3D array? – MrFlick Jun 12 '14 at 23:16
  • No, there's not. It seems like other people are suggesting using a list or 3D array. This seems like it would make more sense but I don't have experience doing that yet. My data starts as separate variables because that's how they are output from a model I'm using. – user3453009 Jun 12 '14 at 23:30

2 Answers2

2

If you stored your matrices in a list you would find the manipulations easier:

lst <- list(matrix(c(0.38, 0.58, 0.27, 0.72, 0.37, 0.55, 0.15, 0.09, 0.22), nrow=3),
            matrix(c(0.19, 0.98, 0.72, 0.78, 0.65, 0.57, 0.72, 0.46, 0.76), nrow=3),
            matrix(c(0.39, 0.40, 1.00, 0.68, 0.05, 0.43, 0.31, 0.92, 0.21), nrow=3))
Reduce("+", lapply(lst, ">=", 0.7))
#      [,1] [,2] [,3]
# [1,]    0    2    1
# [2,]    1    0    1
# [3,]    2    0    1

Here, the lapply(lst, ">=", 0.7) returns a list with x >= 0.7 called for every matrix x stored in lst. Then Reduce called with + sums them all up.

If you just have three matrices, you could just do something like lst <- list(matrix1, matrix2, matrix3). However, if you have a lot more (let's say 100, numbered 1 through 100), it's probably easier to do lst <- lapply(1:100, function(x) get(paste0("matrix", x))) or lst <- mget(paste0("matrix", 1:100)).

For 100 matrices, each of size 100 x 100 (based on your comment this is roughly the size of your use case), the Reduce approach with a list seems to be a bit faster than the rowSums approach with an array, though both are quick:

# Setup test data
set.seed(144)
for (i in seq(100)) {
    assign(paste0("matrix", i), matrix(rnorm(10000), nrow=100))
}

all.equal(sum.josilber(), sum.gavin())
# [1] TRUE
library(microbenchmark)
microbenchmark(sum.josilber(), sum.gavin())
# Unit: milliseconds
#            expr       min       lq   median       uq      max neval
#  sum.josilber()  6.534432 11.11292 12.47216 17.13995 160.1497   100
#     sum.gavin() 11.421577 16.54199 18.62949 23.09079 165.6413   100
josliber
  • 43,891
  • 12
  • 98
  • 133
  • For the last bit, you can now do `mget(paste0("matrix", 1:100))` would instead of `lapply`-ing `get()`. – Gavin Simpson Jun 12 '14 at 23:26
  • @GavinSimpson -- thanks! Added that in as an option. – josliber Jun 12 '14 at 23:27
  • Thanks! My actual data will be much larger, though. I'll have a few hundred matrices and instead of being 3x3 they'll be more like 100x100. It looks like your last code would be useful. – user3453009 Jun 12 '14 at 23:28
  • 1
    @user3453009 yeah, use `mget` or `lapply` with `get` to build `lst` and then the `Reduce` call with `lapply` to get your result. Even though you have hundreds of matrices, you should be able to get your result with 2 lines of code. – josliber Jun 12 '14 at 23:29
  • @user3453009 The more matrices you have and the bigger they are, the slower the `Reduce()` approach will become (presumably because of the intermediate `lapply()` step to convert each matrix to a logical indicator matrix). If compute time becomes an issue, the `rowSums()` approach scales much better, probably because we avoid that intermediate `lapply()` step and convert the entire array to an indicator array in a single call. – Gavin Simpson Jun 12 '14 at 23:43
0

If you put the matrices in an array, this is easy to do without a loop. Here's an example:

## dummy data
set.seed(1)
m1 <- matrix(runif(9), ncol = 3)
m2 <- matrix(runif(9), ncol = 3)
m3 <- matrix(runif(9), ncol = 3)

Stick these into an array

arr <- array(c(m1, m2, m3), dim = c(3,3,3))

Now each matrix is like a plate and the array is a stack of these plates.

Do as you did and convert the array into an indicator array (you don't need to save this step, it could be done inline in the next call)

ind <- arr > 0.7

This gives:

> ind
, , 1

      [,1]  [,2]  [,3]
[1,] FALSE  TRUE  TRUE
[2,] FALSE FALSE FALSE
[3,] FALSE  TRUE FALSE

, , 2

      [,1]  [,2]  [,3]
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE  TRUE
[3,] FALSE  TRUE  TRUE

, , 3

      [,1]  [,2]  [,3]
[1,] FALSE FALSE FALSE
[2,]  TRUE FALSE FALSE
[3,]  TRUE FALSE FALSE

Now use the rowSums() function to compute the values you want

> rowSums(ind, dims = 2)
     [,1] [,2] [,3]
[1,]    0    1    1
[2,]    1    0    1
[3,]    1    2    1

Note that the thing that is summed over in rowSums() is (somewhat confusing!) the dimension dims + 1. In this case, we are summing the values down through the stack of plates (the array) for each 3*3 cell, hence the 9 values in the output.

If you need to get your objects into the array form, you can do this via

arr2 <- do.call("cbind", mget(c("m1","m2","m3")))
dim(arr2) <- c(3,3,3) # c(nrow(m1), ncol(m1), nmat)

> all.equal(arr, arr2)
[1] TRUE

For larger problems (more matrices) use something like

nmat <- 200 ## number matrices
matrices <- paste0("m", seq_len(nmat))
arr <- do.call("cbind", mget(matrices))
dim(arr) <- c(dim(m1), nmat)
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Thanks everyone!! I implemented the rowSums method and it works great. This was incredibly helpful and very pleasant. – user3453009 Jun 13 '14 at 15:14