Ways to improve for loop for matrix manipulations depending on another matrix

Question

I know improving for loop has been asked tons of times before. We can apply family functions to improve the for loop in R.

However is there a way to improve manipulations of a matrix where those manipulations depend on another matrix? What I mean here is this, where the elements I set to 2 in test are based on another matrix index:

for (i in 1:nrow(test)){
  test[i,index[i,]]  <- 2
}    # where index is predetermined matrix

Another example is this, where I set the values in test based on the ordering of elements in the rows of another matrix anyMatrix:

for (i in 1:nrow(test)){
   test[i,] <- order(anyMatrix[i,])
}

I could use lapply or sapply here but they return a list and it takes same amount of time to convert it back to matrix.

Reproducible example:

test <- matrix(0, nrow = 10, ncol = 10)
set.seed(1234)
index <- matrix(sample.int(10, 10*10, TRUE), 10, 10)
anyMatrix <- matrix(rnorm(10*10), nrow = 10, ncol = 10)

for (i in 1:nrow(test)){
  test[i,index[i,]]  <- 2
}

for (i in 1:nrow(test)){
   test[i,] <- order(anyMatrix[i,])
}

josliber · Answer 1 · 2016-02-06T17:33:16.380

You really appear to have two separate problems here.

Problem 1: Given a matrix index, for each row i and column j you want to set test[i,j] to 2 if j appears in row i of index. This can be done with simple matrix indexing, passing a 2-column matrix of indices where the first column is the rows of all the elements you want to index and the second column is the columns of all the elements you want to index:

test[cbind(as.vector(row(index)), as.vector(index))] <- 2
test
#       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#  [1,]    2    2    0    2    2    2    2    0    2     2
#  [2,]    2    0    2    2    2    2    2    0    2     2
#  [3,]    2    2    2    2    0    0    2    2    0     0
#  [4,]    2    2    0    0    0    2    2    2    0     2
#  [5,]    2    2    2    2    0    0    0    0    2     0
#  [6,]    0    0    0    0    0    2    2    2    2     0
#  [7,]    2    0    2    2    2    2    2    0    0     0
#  [8,]    2    0    2    2    2    2    0    2    0     2
#  [9,]    2    2    2    2    0    0    2    0    2     2
# [10,]    2    0    2    0    0    2    2    2    2     0

Since this does all the operations in a single vectorized operation, it should be faster than looping through the rows and handling them individually. Here's an example with 1 million rows and 10 columns:

OP <- function(test, index) {
  for (i in 1:nrow(test)){
    test[i,index[i,]]  <- 2
  }
  test
}
josliber <- function(test, index) {
  test[cbind(as.vector(row(index)), as.vector(index))] <- 2
  test
}
test.big <- matrix(0, nrow = 1000000, ncol = 10)
set.seed(1234)
index.big <- matrix(sample.int(10, 1000000*10, TRUE), 1000000, 10)
identical(OP(test.big, index.big), josliber(test.big, index.big))
# [1] TRUE
system.time(OP(test.big, index.big))
#    user  system elapsed 
#   1.564   0.014   1.591 
system.time(josliber(test.big, index.big))
#    user  system elapsed 
#   0.408   0.034   0.444

Here, the vectorized approach is 3.5x faster.

Problem 2: You want to set row i of test to order applied to the corresponding row of anyMatrix. You can do this with apply:

(test <- t(apply(anyMatrix, 1, order)))
#       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#  [1,]    1   10    7    8    4    5    3    6    2     9
#  [2,]    8    7    1    6    3    4    9    5   10     2
#  [3,]    4    9    7    1    3    2    6   10    5     8
#  [4,]    1    2    6    4   10    3    9    8    7     5
#  [5,]    9    6    5    1    2    7   10    4    8     3
#  [6,]    9    3    8    6    5   10    1    4    7     2
#  [7,]    3    7    2    5    6    8    9    4    1    10
#  [8,]    9    8    1    3    4    6    7   10    5     2
#  [9,]    8    4    3    6   10    7    9    5    2     1
# [10,]    4    1    9    3    6    7    8    2   10     5

I wouldn't expect much of a change in runtime here, because apply is really just looping through the rows similarly to how you were looping in your solution. Still, I would prefer this solution because it's a good deal less typing and the more "R" way of doing things.

Note that both of these applications used pretty different code, which is pretty typical in R data manipulation -- there are a lot of different specialized operators and you need to pick the one that's right for your task. I don't think there's a single function or even really a small set of functions that are going to be able to handle all matrix manipulations where that manipulation is based on data from another matrix.

thanks but how is cbind faster in the first one? Wont cbind take more time than the usual for loop? Do you have a benchmark? — rmania, Feb 06 '16 at 17:05
@rmania I've updated this answer to include a benchmark showing that vectorized indexing operations yield speedups compared to looping alternatives. In R, replacing many repeated fast operations with a single operation that performs all of them together often yields massive speedups. — josliber, Feb 06 '16 at 17:34

Ways to improve for loop for matrix manipulations depending on another matrix

1 Answers1