1

I am working with a 6650254x5650 sparse matrix which values are in numpy.float64 format.

I am using the NMF implemetnation from scikit-learn as following

from sklearn.decomposition import NMF
model = NMF(n_components=12, init='random', random_state=0, max_iter=20, l1_ratio=0.01)
W = model.fit_transform(X_all_sparse, )
H = model.components_
W

It seems for larger number of n_components I get W matrices where all elements are NaN. For example if n_components is larger than 7 - but it works when n_components is 19 ! I wonder what can cause this and what are the other libraries that can handle such big matrices efficiently that I can benchmark against.

update If other have a similar problem, meanwhile, I am using the implicit library

Areza
  • 5,623
  • 7
  • 48
  • 79
  • Same problem here, no idea what's going on. What do you mean by implicit library? – nlhnt Sep 01 '21 at 19:20
  • 1
    @nlhnt use this https://implicit.readthedocs.io/en/latest/ – Areza Sep 02 '21 at 20:07
  • I have found that I got NaNs only when I used pd.Series(<1d-np.array with my results>). The results didn't contain NaNs, but the resulting Series was faulty and contained NaNs, not only that but some weird results as well. Dropping the conversion from np.array to pd.Series and instead just merging my final DF with np.arrays worked out for me. – nlhnt Sep 03 '21 at 07:28

0 Answers0