I am quite new to data mining and statistics.
I've built a Data mining model in Visual Studio. I am using Microsoft clustering algorithm and I got some issues.
I have modified some default parameters, like set the maximum input parameters to 350 and the clustering method to non scalable expectation maximization (I have 80 000 rows of data). Also I've set the Cluster count to 0, so that the algorithm would choose the best one.
here comes the problem. I am using tempdb, which is flushed every time I restart my pc ( I dont have a lot of free space, so tempdb is a good option in that department). Anyways when I reload the same data and build the mining structure I get completely different results. One time I got 10 clusters and then 13 and after that 9. I also tried forcing cluster count to 13 to reproduce the same clusters, but they are also different ( the clusters themselves are different in their distribution and size).
My question is why? Isn't EM deterministic. I understand small changes in the size and Distribution, but I get different results every time the DB is flushed. Shouldn't the algorithm give me almost the same results, not results that are very different. Am I doing something wrong?