3

Is there any support for large sparse matrices in R? I'm currently dealing with a 1.9M sparse square matrix with about 0.001 density.

I wanted to stress test the creating of this matrix in R on my AWS spot instance with 480gb memory.

library(Matrix)

DIMS = as.numeric(1988463)
DENSITY = as.numeric(0.001)
VALS = as.numeric(DIMS*DIMS*DENSITY)

i <- sample(DIMS, VALS, replace = TRUE)    
j <- sample(DIMS, VALS, replace = TRUE)    
x <- rpois(VALS, 10)

sp_matrix <- sparseMatrix(i = i, 
                          j = j, 
                          x = as.numeric(x), 
                          dims=list(DIMS, DIMS))

However, I get this error.

Error in validityMethod(as(object, superClass)): long vectors not supported yet: ../../src/include/Rinlinedfuns.h:522
Traceback:

1. system.time(sp_matrix <- sparseMatrix(i = i, j = j, x = as.numeric(x), 
 .     dims = list(DIMS, DIMS)))
2. sparseMatrix(i = i, j = j, x = as.numeric(x), dims = list(DIMS, 
 .     DIMS))
3. validObject(r)
4. anyStrings(validityMethod(as(object, superClass)))
5. isTRUE(x)
6. validityMethod(as(object, superClass))
Timing stopped at: 76.42 73.41 151

Is there any package or workaround for this issue? In the end i'll be using the reticulate package to load a sparse csr matrix from numpy in order to take advantage of the quicker and memory efficient text2vec package for running glove, which requires the data to be in dgCMatrix format.

Edit

I've also tried spam with the following lines of code to simulate a large and sparse matrix.

library(spam)
test_matrix <- spam_random(nrow = 1900000, ncol = 1900000, density = 0.001)

It will run with the following warning:

Warning message in spam_random(nrow = 1900000, ncol = 1900000, density = 0.001):
"integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'"

Until it eventually times out with the following error message:

Error in if (rowp[i] == rowp[i + 1L]) next: missing value where TRUE/FALSE needed
Traceback:

1. system.time(test_matrix <- spam_random(nrow = 1900000, ncol = 1900000, 
 .     density = 0.001))
2. spam_random(nrow = 1900000, ncol = 1900000, density = 0.001)
Timing stopped at: 1657 228.3 1903
Olivier
  • 321
  • 2
  • 11
  • it's still too large for R.. you can check this post https://stackoverflow.com/questions/54405435/sparse-matrix-support-for-long-vectors-over-231-elements, I don't know if the package still works – StupidWolf May 07 '20 at 11:26
  • So is there no workaround that might work? I know R 3.x can handle much longer vectors than `2^31`. I've tried `spam` with no luck (see edit above). – Olivier May 07 '20 at 12:26
  • @Olivier text2vec glove takes dgTMatrix as input. So it actually needs `i, j, x` triplets. You can try to build triplet matrix with `sp_matrix <- sparseMatrix(i = i, j = j, x = as.numeric(x), dims=list(DIMS, DIMS), giveCsparse = F, check = F)`. Not sure if it will work. – Dmitriy Selivanov May 07 '20 at 13:28
  • @DmitriySelivanov Unfortunately it failed silently. ```Error in validityMethod(as(object, superClass)) : long vectors not supported yet: ../../src/include/Rinlinedfuns.h:522 Calls: ... tryCatch -> tryCatchList -> tryCatchOne -> Execution halted``` Can I assume R just can't handle sparse matrices of this size? – Olivier May 07 '20 at 15:33
  • I don't know for sure, but it seems yes. – Dmitriy Selivanov May 07 '20 at 15:46
  • 1
    To be clear on `spam`, you need to use `spam64` if you want int64 indices for your matrix. – CJR May 10 '20 at 19:44

0 Answers0