-1

I have clustered 43574 time series using EM clusterer. The output is 24 clusters. I have some questions here. First, is it practically useful to deal with 24 clusters? Isn't it too much? If I am passing the results to neurosurgeon labelling these clusters for the purpose of management of the patient is that going to work? My most important question is that as shown below couple of clusters have 0% likelihood?! what does that mean? Then why they are in different clusters? Any help would be greatly appreciated, And this is what I got:

0 1892 ( 4%) 1 5153 ( 12%) 2 1594 ( 4%) 3 1221 ( 3%) 4 122 ( 0%) 5 2714 ( 6%) 6 7092 ( 16%) 7 141 ( 0%) 8 166 ( 0%) 9 464 ( 1%) 10 3331 ( 8%) 11 4316 ( 10%) 14 2411 ( 6%) 15 2573 ( 6%) 17 3063 ( 7%) 18 142 ( 0%) 19 4211 ( 10%) 20 925 ( 2%) 21 2038 ( 5%) 22 5 ( 0%)

TylerH
  • 20,799
  • 66
  • 75
  • 101
Parisan
  • 21
  • 1
  • 7

1 Answers1

0

These values are not likelihoods, but size.

data=array([1892, 5153, 1594, 1221, 122, 2714, 7092, 141, 166,
  464, 3331, 4316, 2411, 2573, 3063, 142, 4211, 925, 2038, 5])

for f in data * 100. / sum(data): print "%.1f%%" % f,

yields the following relative cluster sizes with an additional digit of precision:

4.3% 11.8% 3.7% 2.8% 0.3% 6.2% 16.3% 0.3% 0.4% 1.1% 7.6% 9.9%
5.5% 5.9% 7.0% 0.3% 9.7% 2.1% 4.7% 0.0%

These are not likelihoods. It's cluster size / data set size.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Anony, It's unclear to me that what does it mean to define 5 clusters that all have 0% ?! As u said, the cluster size. If no data can be categorized on that cluster, why do we get that cluster?! Also why I get several 0% instead of 1 ? – Parisan Apr 15 '16 at 19:57
  • It's not absolute 0.0000000% but 5/43574. It's not a probability either. it's just a **very small cluster with just 5 objects**. Probably outliers or bad preprocessing. – Has QUIT--Anony-Mousse Apr 15 '16 at 20:40