I'm working on a clustering problem. To ensure result reproducibility, we initially set the random_state
parameter in KMeans()
to 0. However, after updating scikit-learn from version 0.22.2 to version 1.2.2, i encountered an unexpected issue. When i ran the same code with the same dataset , the results differed from our previous run. We are uncertain about the reasons behind this inconsistency and have been unable to reproduce the initial result.
Code:
model = KMeans(n_clusters=5, init='k-means++', tol=0.0001, random_state=0, copy_x=True, algorithm='auto' )
Expected Results Number of cluster = 5
Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4| cluster 5
10| 20| 12| 30|45
Actual Results
Version 0.22.2 : Number of cluster = 5
Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4| cluster 5
10| 5| 6| 14|5
Version 1.2.2 : Number of cluster = 5
Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4| cluster 5
3| 7| 20| 8|2