2

I have hundreds of variables with binary values i.e., 1 & 0 and I want to see how these variables fall into different clusters? I don't see any python methods to apply. But I can see one in R: http://arxiv.org/pdf/1112.0295.pdf

For example, I have data with variables(features) a1, a2, a3, a4,.......,a100. Each a's are binary variables. Instead of applying clustering on observations I want to apply clustering on a1,a2,...,a100 and want to see in which clusters a1 falls or a2 falls.

Does any one know similar package or methods in python? I tried to apply R interface in Anaconda so that I can use R methods but interface is not working.

Python 3.4.3 |Anaconda 2.3.0 (64-bit)|

Frank
  • 66,179
  • 8
  • 96
  • 180
Sanoj
  • 1,347
  • 3
  • 15
  • 21

2 Answers2

3

First transpose your data matrix.

Then cluster features instead of instances!

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
0

The package scikit-learn has exactly what you are looking for.

It contains a lot of clustering algorithms like K-means,Affinity propagation, Mean-shift, Spectral clustering, Ward hierarchical clustering, Agglomerative clustering, DBSCAN, Gaussian Mixtures and more..

Niki van Stein
  • 10,564
  • 3
  • 29
  • 62
  • All those methods, in scikit-learn, are applied on observations not on variables. I have updated original question to make it more clear. – Sanoj Nov 10 '15 at 18:02
  • @Sanoj, is what you want to do not like PCA? (http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) – Niki van Stein Nov 10 '15 at 20:48
  • You can also look here: http://stats.stackexchange.com/questions/138325/clustering-a-correlation-matrix the answer provided code for covariance clustering. – Niki van Stein Nov 10 '15 at 20:50