SSAS clustering algorithm results vary with same input

Question

I am quite new to data mining and statistics.

I've built a Data mining model in Visual Studio. I am using Microsoft clustering algorithm and I got some issues.

I have modified some default parameters, like set the maximum input parameters to 350 and the clustering method to non scalable expectation maximization (I have 80 000 rows of data). Also I've set the Cluster count to 0, so that the algorithm would choose the best one.

here comes the problem. I am using tempdb, which is flushed every time I restart my pc ( I dont have a lot of free space, so tempdb is a good option in that department). Anyways when I reload the same data and build the mining structure I get completely different results. One time I got 10 clusters and then 13 and after that 9. I also tried forcing cluster count to 13 to reproduce the same clusters, but they are also different ( the clusters themselves are different in their distribution and size).

My question is why? Isn't EM deterministic. I understand small changes in the size and Distribution, but I get different results every time the DB is flushed. Shouldn't the algorithm give me almost the same results, not results that are very different. Am I doing something wrong?

score 1 · Answer 1 · answered Apr 24 '16 at 21:08

1

EM (Gaussian Mixture Modeling) is just like k-means usually initialized randomly.

So not, it is not deterministic, and getting different results is normal.

answered Apr 24 '16 at 21:08

Has QUIT--Anony-Mousse

76,138
12
138
194

But the fluctuations in the result should be small, not major? – DarkFeud Apr 25 '16 at 14:23
only if the data is very clean and really gaussian. if it's a bad fit, there is likely more than one bad fit. – Has QUIT--Anony-Mousse Apr 25 '16 at 18:42

SSAS clustering algorithm results vary with same input

1 Answers1