0

I have a numPy sparse matrix saved as .npz file, which I want to load into R. This is what I tried:

library(reticulate)
np <- import("numpy")
npz_train <- np$load('netflix_matrix.npz')
df <- npz_train$f[["data"]]

The df length is 103327692, but for

npz_train_imgs$f[["shape"]]
[1]   17770 2649429

which is a total diffrent size from 103327692. I know it is a sparse matrix, so how can I read it as one? I would like to have NAs in the blanks.

Thank you!

Rachel
  • 41
  • 1
  • 4
  • Look at the code for `save_npz` or `load_npz` to see how the sparse matrix is saved and loaded. The details differ depending on the format. If it is the common `csr` format, `load` uses: `cls((loaded['data'], loaded['indices'], loaded['indptr']), shape=loaded['shape'])`. I believe `R` has a sparse format, but you'll have to study its docs to determine how it can be used with attributes like this. – hpaulj Dec 05 '20 at 20:22
  • That `shape` and `data` shape look reasonable for a sparse matrix, with a sparsity of .002. `data` has the non-zero values of this large array. What you call 'blanks' are zeros. – hpaulj Dec 05 '20 at 20:35

1 Answers1

0

I figured it out:

scipy <- import("scipy.sparse")
sparse_mat<- scipy$load_npz('netflix_matrix.npz')
Rachel Rap
  • 55
  • 5