6

I would like to calculate entropy of this example scheme

http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html

enter image description here

Can anybody please explain step by step with real values? I know there are unliminted number of formulas but i am really bad at understanding formulas :)

For example in the given image, how to calculate purity is clearly and well explained

The question is very clear. I need an example how to calculate entropy of this clustering scheme. It can be step by step explanation. It can be C# code or Phyton Code to calculate such scheme

Here entropy formula

I will code this in C#

Thank you very much for any help

enter image description here

I need answer like given in here : https://stats.stackexchange.com/questions/95731/how-to-calculate-purity

Community
  • 1
  • 1
Furkan Gözükara
  • 22,964
  • 77
  • 205
  • 342
  • http://stackoverflow.com/questions/35760706/how-to-calculate-clustering-entropy-example-and-my-solution-given-but-is-it-co – Mitch Wheat Mar 03 '16 at 05:19
  • I'm voting to close this question as off-topic because it appears to be a statistics question – Shog9 Mar 13 '16 at 00:02

2 Answers2

21

This section of the NLP book is a little confusing I will admit because they don't follow through with the complete calculation of the external measure of cluster entropy, instead they focus on the calculation of an individual cluster entropy calculation. Instead I will try to use a more intuitive set of variables and include the complete method for calculating the external measure of total entropy.

The total entropy of a clustering is:

formula

where:

formula is the set of clusters

H(w) is a single clusters entropy

N_w is the number of points in cluster w

N is the total number of points.

Entropy of a cluster w

formula

where: c is a classification in the set C of all classifications

P(w_c) is probability of a data point being classified as c in cluster w.

To make this usable we can substitute the probability with the MLE (maximum likelihood estimate) of this probability to arrive at:

formula

where:

|w_c| is the count of points classified as c in cluster w

n_w is the count of points in cluster w

So in the example given you have 3 clusters (w_1,w_2,w_3), and we will calculate the entropy for each cluster separately, for each of the 3 classifications (x,circle,diamond).

H(w_1) = (5/6)log_2(5/6) + (1/6)log_2(1/6) + (0/6)log_2(0/6) = -.650

H(w_2) = (1/6)log_2(1/6) + (4/6)log_2(4/6) + (1/6)log_2(1/6) = -1.252

H(w_3) = (2/5)log_2(2/5) + (0/5)log_2(0/5) + (3/5)log_2(3/5) = -.971

So then to find the total entropy for a set of clusters, you take the sum of the entropies times the relative weight of each cluster.

H(Omega) = (-.650 * 6/17) + (-1.252 * 6/17) + (-.971 * 5/17)

H(Omega) = -.956

I hope this helps, please feel free to verify and provide feedback.

Community
  • 1
  • 1
Snives
  • 1,226
  • 11
  • 21
2

The computation is straightforward.

The probabilities are NumberOfMatches/NumberOfCandidates. The you apply base2 logarithms and take the sums. Usually, you will weight the clusters by their relative sizes.

The only thing to pay attention to is when p=0. Then the logarithm is undefined. But we can safely use p log p = 0 if p = 0 because of the p outside the logarithm.

Because log 1 = 0 the minimum entropy is 0. Perfect results must score entropy 0, or you have an error.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • ok so for first cluster P(wk) is = 4/5 right? N is cluster count right? so i calculate for first cluster like this one - ( (4/5 * log(4/5)) / (4/3 * log(4/3))) am i correct? then i calculate like this for each cluster and sum all? – Furkan Gözükara Mar 01 '16 at 08:52
  • You have to look at every label in every cluster, not only the majority label. You have 4/5 and 1/5 in the first cluster. – Has QUIT--Anony-Mousse Mar 01 '16 at 08:54
  • can you write first cluster calculation if possible ty very much. and after each cluster calculation i sum up all right? – Furkan Gözükara Mar 01 '16 at 08:55
  • can you update your answer like here? http://stats.stackexchange.com/questions/95731/how-to-calculate-purity – Furkan Gözükara Mar 01 '16 at 11:52