I am currently studying CluStream, and I have some doubts regarding the results. I will proceed to explain:
If the micro clusters are clustered using K means, we all know that every micro cluster will belong to the closest macro cluster (computing the euclidean distance between the centers).
Now, looking at the following sample result:
we can see that the macro clusters do not group all the micro clusters …
What does this mean? How should we consider the micro clusters that do not lie inside some macro cluster? Should I find every micro cluster closest macro one to label them?
EDIT:
Checking the MOA source code on Github, I found that the macro clusters radius is calculated multiplying the deviation AVG by the so called ‘radius factor’ (which value is fixed at 1.8). However, when I ask the macro clusters for their weights, if a huge time window is used and there is not a fading component, I can see that the macro clusters resume the information of all the points ... all the current micro clusters are considered! So, even if we see some micro clusters that stay out of the macro clusters spheres, we know that they belong to the closest one - it's K means after all!
So, I still have a question: why calculating the macro clusters radius that way? I mean, what does it represent? Should not the algorithm return the labeled micro clusters instead?
Any feedback is welcomed. TIA!