2

I am trying to use NLTK's KMeans Clustering Algorithm. It is generally going fine. I want to use the Metrics package of NLTK to determine precision,recall and f measure.

I searched for some examples in web and in other references but may be without a clue.

If any one may kindly cite an example or reference. Thanks in Advance.

alvas
  • 115,346
  • 109
  • 446
  • 738
Coeus2016
  • 355
  • 4
  • 14

2 Answers2

1

It is hard to evaluate the performance of unsupervised learning i.e. clustering. It entirely depends on why are you trying to cluster in the first place.

Still, I think there are few performance metrics available, like,

http://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation

Sagar Waghmode
  • 767
  • 5
  • 16
  • Sometimes people also use B-cubed (http://cs.utsa.edu/~qitian/seminar/Spring11/03_11_11/IR2009.pdf) , e.g. http://www.aclweb.org/anthology/W/W14/W14-2211.pdf (disclaimer: I'm a co-author of that paper). – alvas Mar 29 '16 at 17:53
  • Interesting, would surely love to try sometime. – Sagar Waghmode Mar 29 '16 at 17:54
  • I think purity used to be a common eval metric: For each computed cluster C, let M(C) the true cluster that best matches C. For document d, let C(d) be the computed cluster containing d and let T(d) be the true cluster containing d. Then Purity = fraction of d for which M(C(d)) = T(d). – alvas Mar 29 '16 at 17:55
  • There's one b-cubed scorer for an entity clustering task from http://www.nist.gov/tac/2012/KBP/tools/el_scorer.py too =) – alvas Mar 29 '16 at 17:56
  • 1
    I think this holds true provided you know the clusters before hand. Lot of times I do clustering to find the patterns in the data where I have very little information about the clusters which belongs where. – Sagar Waghmode Mar 29 '16 at 17:57
  • Agree. Clustering is unsupervised but the need for "evaluation" makes it "pseudo-supervised". – alvas Mar 29 '16 at 17:58
  • Thank you for nice pointers and discussion. I generally juggle with models to learn better. I am presently trying to do label sequence with Cluster. By the way found another good library pattern. – Coeus2016 Mar 30 '16 at 01:23
  • Can you please share it with us as well? – Sagar Waghmode Mar 30 '16 at 07:36
  • Please see [Pattern](http://www.clips.ua.ac.be/pattern) does not seem to depend on NLTK like [Textblob](https://textblob.readthedocs.org/en/dev/). – Coeus2016 Mar 30 '16 at 16:52
0

Precision, Recall, and thus the F-measure are inappropriate for cluster analysis. Clustering is not classification, and clusters are not classes!

Common measures for clustering (if you are trying to compare with existing labels, which does not make a whole lot of sense - if you already know the classes, then use classification and not clustering) are the Adjusted Rand Index and its variants.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Thank you for the your kind reply. But researchers are taking clusters as classes and PoS Tagging or NE are possible through clustering. If we may not take F Measure then what may the common metric to evaluate both classifier and clusters over same problem? But isn't (http://www.stat.cmu.edu/~cshalizi/490/10/clustering/clustering02.r) using confusion matrix in K Means. Please correct if I am misinterpreting it. – Coeus2016 Mar 30 '16 at 10:29
  • The problem is that there is no 1:1 correspondence of clusters and classes. It's not as if the clustering algorithm would produce e.g. "android" and "apple" classes. There is a reason why the author of that r script put quotation marks around the term "confusion matrix" - while it is computed the same way as you would do in classification, it has a different semantic and must not be evaluated the same way. – Has QUIT--Anony-Mousse Mar 30 '16 at 10:33
  • I can only recommend to **not compare clustering and classification results**. It's apples and oranges. – Has QUIT--Anony-Mousse Mar 30 '16 at 10:38