Spark KMeans produces deterministic results and not random

Question

I am running Spark KMeans and I would like to have random seeds in every run for different results every time, however this is not the case. This is the code that I am using:

KMeans kmeans = new KMeans().setK(4).setInitMode("random");
KMeansModel model = kmeans.fit(ds);
Dataset<Row> predictions = model.transform(ds);

I always get the same score and the same clusters. Am I missing something in the code?

score 0 · Accepted Answer · answered May 15 '23 at 22:19

0

I think you're missing the random seed:

// Set the random seed
long seed = System.currentTimeMillis();

// Create the KMeans instance and set the random seed
KMeans kmeans = new KMeans().setK(4).setInitMode("random").setSeed(seed);
KMeansModel model = kmeans.fit(ds);
Dataset<Row> predictions = model.transform(ds);

answered May 15 '23 at 22:19

MYK

825
12
25

Thanks a lot! You are right! It seems randomness must be created manually! – Des0lat0r May 16 '23 at 11:04

Spark KMeans produces deterministic results and not random

1 Answers1