1

Say I have a high dimensional dataset which I assume to be well separable by some kind of clustering algorithm. And I run the algorithm and end up with my clusters.

Is there any sort of way (preferable not "hacky" or some kind of heuristic) to explain "what features and thresholds were important in making members of cluster A (for example) part of cluster A?"

I have tried looking at cluster centroids but this gets tedious with a high dimensional dataset.

I have also tried fitting a decision tree to my clusters and then looking at the tree to determine which decision path most of the members of a given cluster follow. I have also tried fitting an SVM to my clusters and then using LIME on the closest samples to the centroids in order to get an idea of what features were important in classifying near the centroids.

However, both of these latter 2 ways require the use of supervised learning in an unsupervised setting and feel "hacky" to me, whereas I'd like something more grounded.

lrosen
  • 23
  • 3
  • i am afraid looking at the centroid of your means will probably be your best bet for k-means clustering – modesitt Jul 16 '18 at 14:32
  • Thanks @modesitt, are there other clustering algorithms that yield more explainable results? – lrosen Jul 16 '18 at 14:35

2 Answers2

0

Have you tried using PCA or some other dimensionality reduction techniques and checking whether the clusters still hold? Sometimes relationships still exist in lower dimensions (Caveat: it doesn't always help one's understanding of the data). Cool article about visualizing MNIST data. http://colah.github.io/posts/2014-10-Visualizing-MNIST/. I hope this helps a bit.

Tank
  • 501
  • 3
  • 19
-1

Do not treat the clustering algorithm as a black box.

Yes, k-means uses centroids. But most algorithms for high-dimensional data don't (and don't use k-means!). Instead, they will often select some features, projections, subspaces, manifolds, etc. So look at what information the actual clustering algorithm provides!

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194