I have two models trained using same data the KMeans model in like below:
int numIterations = 20;
int numClusters = 5;
int runs = 10;
double epsilon = 1.0e-6;
KMeans kmeans = new KMeans();
kmeans.setEpsilon(epsilon);
kmeans.setRuns(runs);
kmeans.setMaxIterations(numIterations);
kmeans.setK(numClusters);
KMeansModel model = kmeans.run(trainDataVectorRDD.rdd());
And the StreamingKmeans like below:
int numOfDimensions = 3;
int numClusters = 5;
StreamingKMeans kmeans = new StreamingKMeans()
.setK(numClusters)
.setDecayFactor(1.0)
.setRandomCenters(numOfDimensions, 1.0, 0);
kmeans.trainOn(trainDataVectorRDD);
The idea with the streaming one is that i read off everything from kafka queue and and train the model and it will auto update as new data comes in.
I get two different cluster centers for both model. Where did I go wrong? The regular KMeans one is the correct one. I am just posting 2 out of 5 cluster centers here. Any help is appreciated, thank you =).
Clusters: Kmeans
clusterCenter: [1.41012161E9,20.9157142857143,68.01750871080174]
clusterCenter: [2.20259211E8,0.6811821903787257,36.58268423745944]
Clusters: StreamingKmeans
clusterCenter: [-0.07896129994296074,-1.0194960760532714,-0.4783789312386866]
clusterCenter: [1.3712228467872134,-0.16614353149605163,0.24283231360124224]