I am trying to construct a large sparse matrix with a split-apply-combine approach by separately calling sparse.model.matrix()
from the package Matrix
on subsets of columns of a dataframe and then binding them together into a full matrix. I have to do this because of memory limitations (I can't call sparse.model.matrix on the whole df at once). This process works fine, and I get a list of sparse matrices, but these have different dimensions and when I try to bind them together, I can't.
ex:
data(iris)
set.seed(100)
iris$v6 <- sample(c("a","b","c",NA), 150, replace=TRUE)
iris$v7 <- sample(c("x","y",NA), 150, replace = TRUE)
sparse_m1 <- sparse.model.matrix(~., iris[,1:5])
sparse_m2 <- sparse.model.matrix(~.-1, iris[, 6:7])
dim(sparse_m1)
[1] 150 7
dim(sparse_m2)
[1] 71 4
cbind2(sparse_m1, sparse_m2)
Error: Matrices must have same number of rows in cbind2(sparse_m1, sparse_m2)
cbind(sparse_m1, sparse_m2)
Error: Matrices must have same number of rows in cbind2(..1, r)
The matrices have the same row names, just some rows have been omitted from sparse_m2 because they had missing values in both columns. Is there any way to combine them?
I also tried using rbind.fill.matrix()
from the plyr
package, by first transposing and then calling it and then re-transposing, but then I lose column names since row names are ignored in rbind.fill.matrix.
Any ideas?