2

I am trying to create a plot similar to this:

enter image description here

Here there are three clusters and all the datapoints (circles) are plotted according to their euclidean distance from the centroid. Using this image its easy to see that 5 samples from class 2 ended up in wrong clusters.

I'm running k-means using kmeans and can't figure out how to plot this type of graph.

For example purposes we can use the iris dataset.

> iri <- iris
> cl <- kmeans (iri[, 1:4], 3)
> cl
K-means clustering with 3 clusters of sizes 38, 62, 50

Cluster means:
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     6.850000    3.073684     5.742105    2.071053
2     5.901613    2.748387     4.393548    1.433871
3     5.006000    3.428000     1.462000    0.246000

Clustering vector:
  [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [40] 3 3 3 3 3 3 3 3 3 3 3 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
 [79] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 1 1 1 1 2 1 1 1 1 1 1 2 2 1 1
[118] 1 1 2 1 2 1 2 1 1 2 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 2 1 1 1 2 1 1 2

Within cluster sum of squares by cluster:
[1] 23.87947 39.82097 15.15100
 (between_SS / total_SS =  88.4 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
[6] "betweenss"    "size"         "iter"         "ifault"      

Source for the image: https://github.com/michaelwsherman/winecluster The originator doesn't seem to be using kmeans though.

I suspect this might not be doable using kmeans for it does not provide distance from centroid. Is there another way to show data in this way or something similar?

Anthony
  • 33,838
  • 42
  • 169
  • 278

0 Answers0