0

I have a cluster plot by R while I want to optimize the "elbow criterion" of clustering with a wss plot, so I drew a wss plot for my cluster, but is looks really strange and I do not know how many elbows should I cluster, anyone could help me?

Here is my data:

Friendly<-c(0.533,0.854,0.9585,0.925,0.9125,0.9815,0.9645,0.981,0.9935,0.9585,0.996,0.956,0.9415)
Polite<-c(0,0.45,0.977,0.9915,0.929,0.981,0.9895,0.9875,1,0.96,0.996,0.873,0.9125)
Praising<-c(0,0,0.437,0.9585,0.9415,0.9605,0.998,0.998,0.8915,1,1,1,0.977)
Joking<-c(0,0,0,0.617,0.942,0.9665,0.9935,0.992,0.935,0.987,0.975,0.9915,0.9665)
Sincere<-c(0,0,0,0,0.617,0.8335,0.985,0.9895,0.977,0.9205,1,0.9585,0.8895)
Serious<-c(0,0,0,0,1,0.642,0.975,0.9605,0.9645,0.9895,0.8125,0.9605,0.925)
Hostile<-c(0,0,0,0,0,0,0.629,0.656,0.948,0.9705,0.9645,0.998,0.9685)
Rude<-c(0,0,0,0,0,0,0,0.687,0.979,0.954,0.954,0.996,0.956)
Irony<-c(0,0,0,0,0,0,0,0,0.354,0.9815,0.996,1,0.971)
Insincere<-c(0,0,0,0,0,0,0,0,1,0.396,0.996,0.9915,0.9415)
Commanding<-c(0,0,0,0,0,0,0,0,0,1,0.462,0.9605,0.9165)
Suggesting<-c(0,0,0,0,0,0,0,0,0,0,0,0.867,0.775)
Neutral<-c(0,0,0,0,0,0,0,0,0,0,0,0,0.283)

data <- data.frame(Friendly,Polite,Praising,Joking,Sincere,Serious,Hostile,Rude,Blaming,Insincere,Commanding,Suggesting,Neutral)

And here is my code of clustering: the method is given by Gavin in the last line of :How to draw the plot of within-cluster sum-of-squares for a cluster?

##cluster analysis
dist<-as.dist(data)
hc<-hclust(dist, method="average")
plot(hc, main="", sub='Method="Average"', ann=T, axes=T, hang=0.2)
##draw a wss plot
res <- sapply(seq.int(1, 13), wrap, h = hc, x = data) 
plot(seq_along(res), res, type="b", pch=19)

But it looks like this, anyone can explain why this happened and how to decide the "elbow criterion"? enter image description here

Community
  • 1
  • 1
Ping Tang
  • 415
  • 1
  • 9
  • 20

1 Answers1

2

Why do you expect that WSS will decline smoothly with increasing numbers of clusters? It need not, as you found out. Only with well-behaved data have I seen nicely behaved scree plots.

There is a big drop in the WSS with 7 clusters which might suggest you want to stop there. However, you should also look at the dendrogram when you evaluate this.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Ok, I got it. However, I saw someone label the y-axis of his wss plot as "Between-Inertia"which I assume they have similar meaning, right? But what is the exact meaning of Between-Inertia? Thank you. – Ping Tang Sep 23 '14 at 00:44