sklearn MDS crashes my kernel?

Question

I have a 50,000 x 15 numpy matrix with continuous data. I want to use MDS (Multi-Dimensional Scaling) to scale down to 2 components in order to visualise the data in a 2-D vector space. For some reason, whenever I go to run the MDS on my data, my memory and CPU % increase quite highly and my kernel crashes, telling me I need to restart. Any one run into similar issues or know what may be causing this?

I'm using a MacBook Air, 125GB SSD, 4GB RAM and my development environment is the Spyder IDE.

Thanks

score 3 · Answer 1 · answered Apr 24 '15 at 15:21

3

I recommend running MDS with a 5% random sample. Looking through the scikit documentation, it seems most of the algorithms in the Manifold learning module have complexity of O(n^2). There no specific documentation for MDS, but comparing run times I can only assume MDS is n^2 or worse. Too much data, inefficient algorithm, small RAM = kernel crash

http://scikit-learn.org/stable/modules/manifold.html#manifold

answered Apr 24 '15 at 15:21

misspelled

306
1
5

+1 I have a similar issue. My kernel does not crash, but calculations do not finish even after a couple of hours . The only solution I found is to use a small sample as recommended above. – lanenok Apr 24 '15 at 15:56

ogrisel · Accepted Answer · 2015-04-25T09:35:57.250

3

Our current implementation of MDS is based on the smacof method which is too generic. A PCA / SVD might be much faster in many cases. This is planned as a pull request.

In the mean time you can directly use sklearn.decomposition.RandomizedPCA instead of the MDS class.

edited Apr 25 '15 at 09:35

answered Apr 24 '15 at 16:51

ogrisel

39,309
12
116
125

sklearn MDS crashes my kernel?

2 Answers2