8

Is it possible to get same kmeans clusters for every execution for a particular data set. Just like for a random value we can use a fixed seed. Is it possible to stop randomness for clustering?

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
Robin
  • 81
  • 1
  • 1
  • 2

2 Answers2

19

Yes. Use set.seed to set a seed for the random value before doing the clustering.

Using the example in kmeans:

set.seed(1)
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")


set.seed(2)
XX <- kmeans(x, 2)

set.seed(2)
YY <- kmeans(x, 2)

Test for equality:

identical(XX, YY)
[1] TRUE
Andrie
  • 176,377
  • 47
  • 447
  • 496
  • A note to explain why use `set.seed` each time. This is due to when calling a random number generation function, the output depends on the values of `.Random.seed`, that changes after executing these functions. ref: https://r-coder.com/set-seed-r/ – Zhilong Jia Nov 08 '21 at 01:04
4

Yes, calling set.seed(foo) immediately prior to running kmeans(....) will give the same random start and hence the same clustering each time. foo is a seed, like 42 or some other numeric value.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • just add to Andrie and Gavin's response, I've tested that even when we set the `nstart` argument in the kmeans() bigger than 1, i.e. having multiple iteration of random seeding, the set.seed() will get producible identical results. – X.X Sep 16 '17 at 00:07