I use the k-medoids algorithm pam
to do clustering based on the (symmetric) distance matrix, tmp
, below:
if(!require("cluster")) { install.packages("cluster"); require("cluster") }
tmp <- matrix(tmp <- matrix(c( 0, 20, 20, 20, 40, 60, 60, 60, 100, 120, 120, 120,
20, 0, 20, 20, 60, 80, 40, 80, 120, 100, 140, 120,
20, 20, 0, 20, 60, 80, 80, 80, 120, 140, 140, 80,
20, 20, 20, 0, 60, 80, 80, 80, 120, 140, 140, 140,
40, 60, 60, 60, 0, 20, 20, 20, 60, 80, 80, 80,
60, 80, 80, 80, 20, 0, 20, 20, 40, 60, 60, 60,
60, 40, 80, 80, 20, 20, 0, 20, 60, 80, 80, 80,
60, 80, 80, 80, 20, 20, 20, 0, 60, 80, 80, 80,
100, 120, 120, 120, 60, 40, 60, 60, 0, 20, 20, 20,
120, 100, 140, 140, 80, 60, 80, 80, 20, 0, 20, 20,
120, 140, 140, 140, 80, 60, 80, 80, 20, 20, 0, 20,
120, 120, 80, 140, 80, 60, 80, 80, 20, 20, 20, 0),
nr=12, dimnames=list(LETTERS[1:12], LETTERS[1:12]))
tmp_pam <- pam(as.dist(tmp, diag = TRUE, upper = TRUE) , k=3)
tmp_pam$clusinfo # get cluster info
tmp_pam$silinfo # get silhouette information
clusplot(tmp_pam)
I have read here that clusplot
uses cmdscale
and princomp
, which makes sense. However, the order of the operations is not given.
How can I get the Component1 and Component2 coordinates, along with their cluster labels and point id's from the output of clusplot
? I want to have access to these in order to modify / plot them in ggplot.
I can guess the plotting is somehow related to the silhouette information but do not quite understand how we get to the final plot below: