Rebuild a large sparse matrix in R from .npz file

Question

I have generated a large sparse matrix in Python in the COO format and it needs to be processed in R. The COO sparse matrix contains more than 2^31-1 non-zero entries. I tried to save the COO sparse matrix in .npz and rebuild it in R.

The COO sparse matrix has a shape of (1119534, 239415) with 2 230 643 376 non-zero entries.

Code in R

library(Matrix)
library(Rcpp)
library(reticulate)

np <- import("numpy")
npz <- np$load("LARGE_SPARSE_COO.npz")

i = as.numeric(npz$f[["row"]])                                    
j = as.numeric(npz$f[["col"]])                                                 
v = as.numeric(npz$f[["data"]])                                                
dims = as.numeric(npz$f[["shape"]])                                            

X <- sparseMatrix(i, j, x=v, index1=FALSE, dims=dims)

When non-zero entries < 2^31-1, the above code is ok but when it is greater than 2^31-1, the following error occurs

Error in `py_ref_to_r(x)`:

negative length vectors are not allowed**

Calls: as.vector ... py_to_r.numpy.ndarray -> NextMethod -> py_to_r.default -> py_ref_to_r

And I think this is due to the vector size exceeding the 32-bit limit. However I think R supports 64-bit size vector as long vector. How could I save the row, col and data from the .npz as a long vector format and pass to sparseMatrix? or is there any other way to rebuild such a large sparse matrix in R?

I cannot reduce the size of the COO sparse matrix, and some of my matrices have even more non-zero entries. Any help/insight is appreciated.

Edit 1

I am aware of the spam/spam64 package in R, but have no idea how to use it in my case. Also I am not sure if the sparse matrix format from spam will be accepted by glmnet, which the sparse matrix will be finally passed to.

Crazy idea is to open file is Python and serialise to rds file use py2r or whatever — MDEWITT, Sep 14 '19 at 01:06
I tried to use ```numpy2ri``` to convert the numpy array stored internally in the sparse matrix --- ```numpy2ri(X.data)```, but the same error appears — bingung, Sep 15 '19 at 21:51
It looks like it is something in `reticulate` and type conversion. Importantly, is the version of Python you are using 64 bit? Lots of discussion on this topic here https://github.com/rstudio/reticulate/issues/323. Might be worth posting on the R Studio community too. Sorry to not be more help! — MDEWITT, Sep 16 '19 at 00:55
@MDEWITT Thanks for your help! And yes, it is python 3.7 running on 64-bit system — bingung, Sep 16 '19 at 21:13

Rebuild a large sparse matrix in R from .npz file

Error in py_ref_to_r(x):

Edit 1

0 Answers0

Error in `py_ref_to_r(x)`: