EM Clustering with weka with log likelihood of 0 for some clusters? Confusing output

Question

I have clustered 43574 time series using EM clusterer. The output is 24 clusters. I have some questions here. First, is it practically useful to deal with 24 clusters? Isn't it too much? If I am passing the results to neurosurgeon labelling these clusters for the purpose of management of the patient is that going to work? My most important question is that as shown below couple of clusters have 0% likelihood?! what does that mean? Then why they are in different clusters? Any help would be greatly appreciated, And this is what I got:

0 1892 ( 4%) 1 5153 ( 12%) 2 1594 ( 4%) 3 1221 ( 3%) 4 122 ( 0%) 5 2714 ( 6%) 6 7092 ( 16%) 7 141 ( 0%) 8 166 ( 0%) 9 464 ( 1%) 10 3331 ( 8%) 11 4316 ( 10%) 14 2411 ( 6%) 15 2573 ( 6%) 17 3063 ( 7%) 18 142 ( 0%) 19 4211 ( 10%) 20 925 ( 2%) 21 2038 ( 5%) 22 5 ( 0%)

score 0 · Answer 1 · answered Apr 15 '16 at 15:28

0

These values are not likelihoods, but size.

data=array([1892, 5153, 1594, 1221, 122, 2714, 7092, 141, 166,
  464, 3331, 4316, 2411, 2573, 3063, 142, 4211, 925, 2038, 5])

for f in data * 100. / sum(data): print "%.1f%%" % f,

yields the following relative cluster sizes with an additional digit of precision:

4.3% 11.8% 3.7% 2.8% 0.3% 6.2% 16.3% 0.3% 0.4% 1.1% 7.6% 9.9%
5.5% 5.9% 7.0% 0.3% 9.7% 2.1% 4.7% 0.0%

These are not likelihoods. It's cluster size / data set size.

answered Apr 15 '16 at 15:28

Has QUIT--Anony-Mousse

76,138
12
138
194

Anony, It's unclear to me that what does it mean to define 5 clusters that all have 0% ?! As u said, the cluster size. If no data can be categorized on that cluster, why do we get that cluster?! Also why I get several 0% instead of 1 ? – Parisan Apr 15 '16 at 19:57
It's not absolute 0.0000000% but 5/43574. It's not a probability either. it's just a **very small cluster with just 5 objects**. Probably outliers or bad preprocessing. – Has QUIT--Anony-Mousse Apr 15 '16 at 20:40

EM Clustering with weka with log likelihood of 0 for some clusters? Confusing output

1 Answers1