-1

I am trying to understand how can I split my data into clusters using unsupervised learning. For example, k-means method.

I have 20 columns of data and how can it be projected on 2D surface without losing of necessary information from 18 columns?

What should I use to do that?

Any help will be appreciated.

renataleb
  • 21
  • 5

1 Answers1

1

If you are simply interested in viewing your data in 2 dimensions, consider using t-SNE. The scikit-learn python package has a great implementation you can use. However, just remember that you shouldn't cluster your data on the t-SNE output, as the space your data resides in gets sufficiently distorted in the process (only short distances are maintained, whereas longer distances are heavily altered to be either shorter or longer)

alta
  • 353
  • 1
  • 8
  • Thank you for the answer. And how can I perform clustering? – renataleb Jul 05 '18 at 13:15
  • You should cluster on the original unaltered space using something like k-means, or spectral clustering. These are implemented in `scikit-learn` as well. – alta Jul 05 '18 at 19:05