-1

I have two blobs of points in a 2d plane, which slightly overlap. When I run KMeans fit with 2 clusters, and colour the 2d plane using predictions for each point on the lattice, I get a behaviour that is really different from the original datasets classification.

enter image description here

What I think, is that it is difficult for kmeans to find the two clusters here, meaning the predictions aren't very reliable. But what I find weird, is that there are so many datapoints, that are clustered to be in purple, and then predicted to be in yellow and vice versa.

So what is the explanation for this?

If I increase the separation of the two blobs, this effect is much less noticable, but still there.

enter image description here

Any ideas?

1 Answers1

-1

It is possible that the KMeans algorithm is struggling with the slight overlap between the two clusters, leading to the misclassification of some points. In addition, the initial assignment of points to clusters can also affect the final clustering result. It may be helpful to try different initialization methods or algorithm parameters. Another possible approach is to try a different clustering algorithm that can handle overlapping clusters more effectively, such as DBSCAN or OPTICS. As for the unpredictability of the color assignments, it could be due to the fact that KMeans is an unsupervised algorithm and may not necessarily assign the same labels to clusters each time it is run.

Kirill
  • 364
  • 4
  • 14