I have two blobs of points in a 2d plane, which slightly overlap. When I run KMeans fit with 2 clusters, and colour the 2d plane using predictions for each point on the lattice, I get a behaviour that is really different from the original datasets classification.
What I think, is that it is difficult for kmeans to find the two clusters here, meaning the predictions aren't very reliable. But what I find weird, is that there are so many datapoints, that are clustered to be in purple, and then predicted to be in yellow and vice versa.
So what is the explanation for this?
If I increase the separation of the two blobs, this effect is much less noticable, but still there.
Any ideas?