Unsupervised learning reduce dimensionality/clustering

Question

I am trying to understand how can I split my data into clusters using unsupervised learning. For example, k-means method.

I have 20 columns of data and how can it be projected on 2D surface without losing of necessary information from 18 columns?

What should I use to do that?

Any help will be appreciated.

You can't use kmeans for that. Clustering is not a projection. — Has QUIT--Anony-Mousse, Jul 01 '18 at 02:16

score 1 · Accepted Answer · answered Jun 30 '18 at 23:12

1

If you are simply interested in viewing your data in 2 dimensions, consider using t-SNE. The scikit-learn python package has a great implementation you can use. However, just remember that you shouldn't cluster your data on the t-SNE output, as the space your data resides in gets sufficiently distorted in the process (only short distances are maintained, whereas longer distances are heavily altered to be either shorter or longer)

answered Jun 30 '18 at 23:12

alta

353
1
8

Thank you for the answer. And how can I perform clustering? – renataleb Jul 05 '18 at 13:15
You should cluster on the original unaltered space using something like k-means, or spectral clustering. These are implemented in `scikit-learn` as well. – alta Jul 05 '18 at 19:05

Unsupervised learning reduce dimensionality/clustering

1 Answers1