Sparse matrix factorization with Nimfa is very slow with implicit zeros

Question

I am trying to factorize very large matrixes with the python library Nimfa. Since the matrix is so large I am unable to instanciate it in a dence format in memory, so instead I use scipy.sparse.csr_matrix.

The library has a sparse matrix function that is called Snmf: Sparse Nonnegative Matrix Factorization (SNMF), which appears to be what I am looking for.

When trying it out I had serious performance issues with the factorization (not memory representation but in speed) I have not yet been able to factor a simple 10 x 95 matrix that is sparse.

This is how I build the test matrix:

m1 = lil_matrix((10, 95))
for i in xrange(10):
    for j in xrange(95):
        if random.random() > 0.8: m1[i, j] = 1
m1 = csc_matrix(m1)

and this is how I run it

t = time()
fctr = nimfa.mf(m1, 
              seed = "random_vcol", 
              rank = 2, 
              method = "snmf", 
              max_iter = 15, 
              initialize_only = True,
              version = 'r',
              eta = 1.,
              beta = 1e-4, 
              i_conv = 10,
              w_min_change = 0)
print numpy.shape(m1)
a =  nimfa.mf_run(fctr)
print a.coef()
print a.basis()
print time() - t

This doesn't seem to finish at all. But if i do m1.todense() it finishes in seconds. Since I am unable to instanciate my real matrix this is not really a good solution for me.

I have tried different scipy.sparse matrix format but to no avail: csc_matrix, csr_matrix and dok_matrix.

Am I using wrong matrix format? What matrix operations does the snmf algorithm need to execute quickly? Is there some other mistake I am overlooking?

Suggestion: it is an open-source package, so you may take a look at its source code, which may clarify things. — pv., May 25 '13 at 17:07
Maybe you can try to ask this on http://scicomp.stackexchange.com/ — astrojuanlu, May 27 '13 at 17:47

score 3 · Answer 1 · answered Jun 02 '14 at 03:16

I did some digging, and there seems to be a bug in their sparse implementation. What it is, I don't know, but if you look at line 289 in _spfcnlls len(f_set) never decreases and the loop runs forever. When the matrix is not sparse, that method is never called. I opened an issue on the github repository here.

In the meantime, is there a factorization function in numpy or scipy that would fit your needs?

Sparse matrix factorization with Nimfa is very slow with implicit zeros

1 Answers1