1

I have a distance matrix 1609*1609 and the distance range is between 0~1. How to use this matrix to get natural clusters number?

I know spss has a TwoStep cluster function that can generate specific number of clusters, but the input should be variable list. I only have distance matrix, so I think I cannot use the TwoStep cluster in SPSS.

I try to use hclust in R, but it do not give me the number of clusters. I try to use NbClust, but I do not know what my "matrix" is. I only have dissimilarity matrix.

The sample data is as following.

diss_matrix<-matrix(
  c(0,0.916666667,0.916666667,0.916666667,0.916666667,0.916666667,0.25,0.75,0.916666667,0.75,
            0.916666667,0,0.916666667,0.916666667,0.916666667,0.916666667,0.75,0.25,0.916666667,0.25,
            0.916666667,0.916666667,0,0.916666667,0.916666667,0.916666667,0.916666667,0.916666667,0.916666667,0.916666667,
            0.916666667,0.916666667,0.916666667,0,0.916666667,0.916666667,0.916666667,0.916666667,0.916666667,0.916666667,
            0.916666667,0.916666667,0.916666667,0.916666667,0,0.916666667,0.916666667,0.916666667,0.916666667,0.916666667,
            0.916666667,0.916666667,0.916666667,0.916666667,0.916666667,0,0.916666667,0.916666667,0.916666667,0.916666667,
            0.25,0.75,0.916666667,0.916666667,0.916666667,0.916666667,0,0.5,0.916666667,0.75,
            0.75,0.25,0.916666667,0.916666667,0.916666667,0.916666667,0.5,0,0.916666667,0.25,
            0.916666667,0.916666667,0.916666667,0.916666667,0.916666667,0.916666667,0.916666667,0.916666667,0,0.916666667,
            0.75,0.25,0.916666667,0.916666667,0.916666667,0.916666667,0.75,0.25,0.916666667,0),
          nrow=10,
          ncol=10,              
          byrow = TRUE)

dimnames(diss_matrix) = list( 
    paste0("A", 1:10),# row names 
    paste0("A", 1:10)) # column names 
diss_matrix

I use hclust to draw the plot, but this is not what I want.

library(stats)#install.packages("hclust")
diss_matrix2<-as.dist(diss_matrix, diag = FALSE, upper = FALSE)
fit <- hclust(diss_matrix2, method="ward.D")
plot(fit)

I want automatically generate group number, so I try NbClust.

library(NbClust)    
NbClust(data = "NULL", diss = diss_matrix, distance ="NULL", min.nc = 2, max.nc = 15,  method = "ward", index = "all", alphaBeale = 0.1)

But it shows

Error in t(jeu) %*% jeu : 
  requires numeric/complex matrix/vector arguments

Thanks in advance.

Terence Tien
  • 329
  • 1
  • 3
  • 15

1 Answers1

0

From a statistician's perspective, I recommend you move away from what you are trying to do. You should try to use a less heuristic method.

Look up the package mclust for a good example of model based clustering.

Some general examples of clustering methods in R are provided in the link below:

http://www.statmethods.net/advstats/cluster.html

Everitt, et al. (http://www.wiley.com/WileyCDA/WileyTitle/productCd-EHEP002266.html), discuss some of the methods used by the mclust R package. Try the example below.

library(mclust)

data("iris")

fit1 <- Mclust(iris)

plot(fit1)

summary(fit1)

fit1$classification

df <- cbind(iris, fit1$classification)

head(df)

I believe you wanted the classification along with your data, which the above code should provide.

Best of luck

Jon
  • 2,373
  • 1
  • 26
  • 34