7

What do we mean when we say that we are evaluating the clusters in WEKA frmework? Clustering is an unsupervised approach to grouping objects. What do we mean when we say we want to evaluate the result? Also, in addition to this, when we say that we are evaluating the clusters on top of the training data itself, what does that mean?

Thanks Abhishek S

London guy
  • 27,522
  • 44
  • 121
  • 179
  • 1
    Weka is pretty much nonexistant when it comes to clustering. If you are interested in clustering (which is a bit more complicated than classification), look for alternatives. Some pointers about evaluation: pair counting f-measure, Adjusted Rand Index (ARI), Fowlkes-Mallows index, Jaccard index, BCubed measures etc. - I don't think Weka has any of these. – Has QUIT--Anony-Mousse Jun 04 '12 at 20:14

1 Answers1

13

As written on this page:

Evaluation The way Weka evaluates the clusterings depends on the cluster mode you select. Four different cluster modes are available (as buttons in the Cluster mode panel):

  1. Use training set (default). After generating the clustering Weka classifies the training instances into clusters according to the cluster representation and computes the percentage of instances falling in each cluster. For example, the above clustering produced by k-means shows 43% (6 instances) in cluster 0 and 57% (8 instances) in cluster 1.
  2. In Supplied test set or Percentage split Weka can evaluate clusterings on separate test data if the cluster representation is probabilistic (e.g. for EM).
  3. Classes to clusters evaluation. In this mode Weka first ignores the class attribute and generates the clustering. Then during the test phase it assigns classes to the clusters, based on the majority value of the class attribute within each cluster. Then it computes the classification error, based on this assignment and also shows the corresponding confusion matrix. An example of this for k-means is shown below.
Sicco
  • 6,167
  • 5
  • 45
  • 61
  • Thanks for the reference and the elaborative answer. – London guy Jun 04 '12 at 11:13
  • Any idea how classes to clusters evaluation works for EM? Does it weight the instances by their probability of being in the cluster when determining the majority value? – kylejmcintyre Oct 26 '14 at 20:59
  • sicco can u check this question if possible :http://stackoverflow.com/questions/32404742/how-to-calculate-clustering-success-pre-assigment-true-classes-are-known – Furkan Gözükara Sep 04 '15 at 19:36