I cannot access your dataset, so I can only speak from experience. When I try loading a sparse CSR matrix with numpy, it does not work ; the class of the object is numpy.lib.npyio.NpzFile
, which I can't use in R.
The way I found to import the matrix into an R object, as has been said in a post you've linked, is to use scipy.sparse.
library(reticulate)
scipy_sparse = import("scipy.sparse")
csr_matrix = scipy_sparse$load_npz("path_to_your_file")
csr_matrix, which was a scipy.sparse.csr_matrix
object in Python (Compressed Sparse Row matrix), is automatically converted into a dgRMatrix
from the R package Matrix
. Note that if you had used scipy.sparse.csc_matrix
in Python, you would get a dgCMatrix
(Compressed Sparse Column matrix). The actual function doing the hardwork converting the Python object into something R can use is py_to_r.scipy.sparse.csr.csr_matrix
, from the reticulate
package.
If you want to convert the dgRMatrix
into a data frame, you can simply use
df <- as.data.frame(as.matrix(csr_matrix))
although this might not be the best thing to do memory-wise if your dataset is big.
I hope this helped!