I am working with a 19089 x 9432 sparse Matrix of class "dgCMatrix" (let's call it M
), and I have to extract each row to perform some calculations on it. I do this with a loop, so at each iteration I have to do something like currentrow <- M[i,]
to apply the body of the loop on it. The calculations are very time consuming so I want to optimize it as much as possible, and I realized that if I first divide my matrix in small pieces (M[1:100,]
, M[101:200,]
, etc...), and that I do a loop on each of those smaller matrices (therefore calling currentrow <- current_smallM[i,]
at each iteration), the loop is much more faster.
Here is a code example that I run to reproduce this :
library(Matrix)
N = 10000
M = 5000
# Creation of the large matrix (of class dgCMatrix)
largeMatrix <- Matrix(rnorm(N*M,mean=0,sd=1), byrow = TRUE, nrow = N, sparse = TRUE)
# We take into account the time for the creation of the smaller matrix, and then calculate the time to allocate the 200 rows to a variable
start.time = Sys.time()
smallMatrix = largeMatrix[100:200,]
for (i in 1:100){
test <- smallMatrix[i,]
}
end.time = Sys.time()
print(end.time - start.time) # 0.47 secs
# Same allocations but working on the large matrix
start.time = Sys.time()
for (i in 100:200){
test <- largeMatrix[i,]
}
end.time = Sys.time()
print(end.time - start.time) # 18.44 secs
You can see that the time difference is really huge... So I am really wondering :
- Why is this happening?
- Is there a more efficient way to store my data than dividing my matrix in smaller pieces?
Interestingly, I tested the same code with a matrix
object (using largeMatrix <- matrix( rnorm(N*M,mean=0,sd=1), N, M)
), and the results were totally different : 0.06 secs for the divided matrix, and 0.04 secs for the large matrix, so I am really wondering what is different with the sparse representation.
Note: I found a quite similar question here but it was with a different language, and (I think that) the solution does not apply here because it was due to an implicite type conversion, whereas here I am just extracting a row.
Thank you for your help!