In order to evaluate the stability of a classification/clustering solution I am running 1,000 bootstraps of the algorithm on my data. Over these classification outcomes I would like to count how often each pair occurs in the SAME cluster. I have about 250 observations that I am clustering, making about 31k such pairs.
This is pseudo code to generate a synthetic data set:
set.seed(1)
ID <- paste ("ID",seq(1:250),sep="")
cluster1 <- sample(1:5, 250, replace=TRUE)
cluster2 <- sample(1:5, 250, replace=TRUE)
cluster3 <- sample(1:5, 250, replace=TRUE)
df <- data.frame(ID, cluster1, cluster2, cluster3)
You will see that ID3 and ID4 appear in the same cluster twice.
As with all classifications the integer used to denote the cluster membership is arbitrary.