Is there any support for large sparse matrices in R? I'm currently dealing with a 1.9M sparse square matrix with about 0.001
density.
I wanted to stress test the creating of this matrix in R on my AWS spot instance with 480gb memory.
library(Matrix)
DIMS = as.numeric(1988463)
DENSITY = as.numeric(0.001)
VALS = as.numeric(DIMS*DIMS*DENSITY)
i <- sample(DIMS, VALS, replace = TRUE)
j <- sample(DIMS, VALS, replace = TRUE)
x <- rpois(VALS, 10)
sp_matrix <- sparseMatrix(i = i,
j = j,
x = as.numeric(x),
dims=list(DIMS, DIMS))
However, I get this error.
Error in validityMethod(as(object, superClass)): long vectors not supported yet: ../../src/include/Rinlinedfuns.h:522
Traceback:
1. system.time(sp_matrix <- sparseMatrix(i = i, j = j, x = as.numeric(x),
. dims = list(DIMS, DIMS)))
2. sparseMatrix(i = i, j = j, x = as.numeric(x), dims = list(DIMS,
. DIMS))
3. validObject(r)
4. anyStrings(validityMethod(as(object, superClass)))
5. isTRUE(x)
6. validityMethod(as(object, superClass))
Timing stopped at: 76.42 73.41 151
Is there any package or workaround for this issue? In the end i'll be using the reticulate
package to load a sparse csr
matrix from numpy
in order to take advantage of the quicker and memory efficient text2vec
package for running glove, which requires the data to be in dgCMatrix
format.
Edit
I've also tried spam
with the following lines of code to simulate a large and sparse matrix.
library(spam)
test_matrix <- spam_random(nrow = 1900000, ncol = 1900000, density = 0.001)
It will run with the following warning:
Warning message in spam_random(nrow = 1900000, ncol = 1900000, density = 0.001):
"integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'"
Until it eventually times out with the following error message:
Error in if (rowp[i] == rowp[i + 1L]) next: missing value where TRUE/FALSE needed
Traceback:
1. system.time(test_matrix <- spam_random(nrow = 1900000, ncol = 1900000,
. density = 0.001))
2. spam_random(nrow = 1900000, ncol = 1900000, density = 0.001)
Timing stopped at: 1657 228.3 1903