0

I have a large sparse matrix (a Matrix package dgCMatrix object in R).

Toy example:

m <- Matrix(c(0,0,2:0), 3,5)
rownames(m) <- paste0("g",1:3)
colnames(m) <- paste0("c",1:3)
> m
3 x 5 sparse Matrix of class "dgCMatrix"
   c1 c2 c3 c4 c5
g1  .  1  .  .  2
g2  .  .  2  .  1
g3  2  .  1  .  .

And I want to melt it to a data.frame.

reshape2's melt requires coercing this dgCMatrix to a matrix object and for the dimensions I'm working with this is very slow. So I'm looking for something more efficient.

I thought mefa4's Melt would do the trick but it's dropping the zero values:

> mefa4::Melt(m)
  rows cols value
1   g3   c1     2
2   g1   c2     1
3   g2   c3     2
4   g3   c3     1
5   g1   c5     2
6   g2   c5     1

And I would like to keep them and I don't see a parameter in mefa4::Melt's manual providing that. Any ideas on an alternative?

dan
  • 6,048
  • 10
  • 57
  • 125

1 Answers1

0

data:

m <- Matrix(c(0,0,2:0), 3,5)
rownames(m) <- paste0("g",1:3)
colnames(m) <- paste0("c",1:5)

Solution:

data.frame(rows=rownames(x)[row(x)],cols=colnames(x)[col(x)],value=as.numeric(x))

This seems to be faster than reshape2's melt of the coerced matrix and mefa4's Melt:

mf <- function(x){
  mefa4::Melt(x)
}

df <- function(x){
  data.frame(rows=rownames(x)[row(x)],cols=colnames(x)[col(x)],value=as.numeric(x))
}

rs <- function(x){
  reshape2::melt(Matrix::as.matrix(x))
}

microbenchmark::microbenchmark(df(m))
Unit: microseconds
  expr     min      lq     mean  median      uq      max neval
 df(m) 330.378 376.887 778.3489 476.903 699.743 14953.47   100

microbenchmark::microbenchmark(rs(m))
Unit: microseconds
  expr     min      lq     mean  median       uq     max neval
 rs(m) 537.328 638.683 1197.268 726.679 1042.261 15073.3   100

microbenchmark::microbenchmark(mf(m))
Unit: microseconds
  expr     min       lq     mean   median       uq      max neval
 mf(m) 459.947 526.6585 1023.575 677.9775 1007.214 6712.497   100
dan
  • 6,048
  • 10
  • 57
  • 125