With scipy
and rpy2
, you can read each dgCMatrix
object directly into Python as a scipy.sparse.csc_matrix
object. Both use compressed sparse column (CSC) format, so there is actually zero need for preprocessing. All you need to do is pass the attributes of the dgCMatrix
object as arguments to the csc_matrix
constructor.
To test it out, I used R to create an RDS file storing a list of dgCMatrix
objects:
library("Matrix")
set.seed(1L)
d <- 6L
n <- 10L
l <- replicate(n, sparseMatrix(i = sample(d), j = sample(d), x = sample(d), repr = "C"), simplify = FALSE)
names(l) <- as.character(seq(1986L, length.out = n))
l[["1986"]]
## 6 x 6 sparse Matrix of class "dgCMatrix"
##
## [1,] . . 5 . . .
## [2,] 3 . . . . .
## [3,] . . . . . 6
## [4,] . 2 . . . .
## [5,] . . . . 1 .
## [6,] . . . 4 . .
saveRDS(l, file = "list_of_dgCMatrix.rds")
Then, in Python:
from scipy import sparse
from rpy2 import robjects
readRDS = robjects.r['readRDS']
l = readRDS('list_of_dgCMatrix.rds')
x = l.rx2('1986') # in R: l[["1986"]]
x
## <rpy2.robjects.methods.RS4 object at 0x120db7b00> [RTYPES.S4SXP]
## R classes: ('dgCMatrix',)
print(x)
## 6 x 6 sparse Matrix of class "dgCMatrix"
##
## [1,] . . 5 . . .
## [2,] 3 . . . . .
## [3,] . . . . . 6
## [4,] . 2 . . . .
## [5,] . . . . 1 .
## [6,] . . . 4 . .
data = x.do_slot('x') # in R: x@x
indices = x.do_slot('i') # in R: x@i
indptr = x.do_slot('p') # in R: x@p
shape = x.do_slot('Dim') # in R: x@Dim or dim(x)
y = sparse.csc_matrix((data, indices, indptr), tuple(shape))
y
## <6x6 sparse matrix of type '<class 'numpy.float64'>'
## with 6 stored elements in Compressed Sparse Column format>
print(y)
## (1, 0) 3.0
## (3, 1) 2.0
## (0, 2) 5.0
## (5, 3) 4.0
## (4, 4) 1.0
## (2, 5) 6.0
Here, y
is an object of class scipy.sparse.csc_matrix
. You should not need to use the toarray
method to coerce y
to an array with dense storage. scipy.sparse
implements all of the matrix operations that I can imagine needing. For example, here are the row and column sums of y
:
y.sum(1) # in R: as.matrix(rowSums(x))
## matrix([[5.],
## [3.],
## [6.],
## [2.],
## [1.],
## [4.]])
y.sum(0) # in R: t(as.matrix(colSums(x)))
## matrix([[3., 2., 5., 4., 1., 6.]])