bind together sparse model matrices by row names

Question

I am trying to construct a large sparse matrix with a split-apply-combine approach by separately calling sparse.model.matrix() from the package Matrix on subsets of columns of a dataframe and then binding them together into a full matrix. I have to do this because of memory limitations (I can't call sparse.model.matrix on the whole df at once). This process works fine, and I get a list of sparse matrices, but these have different dimensions and when I try to bind them together, I can't.

ex:

data(iris)
set.seed(100)
iris$v6 <- sample(c("a","b","c",NA), 150, replace=TRUE)
iris$v7 <- sample(c("x","y",NA), 150, replace = TRUE)

sparse_m1 <- sparse.model.matrix(~., iris[,1:5])
sparse_m2 <- sparse.model.matrix(~.-1, iris[, 6:7])

dim(sparse_m1)
[1] 150   7

dim(sparse_m2)
[1] 71  4

cbind2(sparse_m1, sparse_m2)
Error: Matrices must have same number of rows in cbind2(sparse_m1, sparse_m2)

cbind(sparse_m1, sparse_m2)
Error: Matrices must have same number of rows in cbind2(..1, r)

The matrices have the same row names, just some rows have been omitted from sparse_m2 because they had missing values in both columns. Is there any way to combine them?

I also tried using rbind.fill.matrix() from the plyr package, by first transposing and then calling it and then re-transposing, but then I lose column names since row names are ignored in rbind.fill.matrix.

Any ideas?

How can you cbind two matrices with different number of rows? — Ven Yao, Oct 19 '15 at 02:45
I'm thinking maybe instead of cbind, is there an easy way to re-represent a dgCmatrix in triplet form and then concatenate the triplets together? — Allen Wang, Oct 19 '15 at 13:06

score 1 · Accepted Answer · answered Feb 20 '22 at 18:32

An old question still in need of an answer...

One approach is to create an empty Matrix of the required dimensions and then populate it:

m12.dimnames<-list(union(rownames(sparse_m1),rownames(sparse_m2)),c(colnames(sparse_m1),colnames(sparse_m2)))
m12<- Matrix(0,nrow=length(m12.dimnames[[1]]),ncol=length(m12.dimnames[[2]]),dimnames=m12.dimnames)
m12[rownames(sparse_m2),colnames(sparse_m2)]<-sparse_m1
m12[rownames(sparse_m2),colnames(sparse_m2)]<-sparse_m2

score 0 · Answer 2 · answered Aug 12 '19 at 13:13

0

recently bumped in the same issue, and nowadays you can

install.packages("Matrix.utils")
library(Matrix.utils)
sparse_filled <- rBind.fill(sparse_m1, sparse_m2)

answered Aug 12 '19 at 13:13

coulminer

358
2
8

this is the answer to a different question. note the OP wants to `cbind`, not `rbind` the results, – malcook Feb 20 '22 at 18:10

bind together sparse model matrices by row names

2 Answers2

Linked