2

I started to do a hierarchical classification on mixed datas using "hclust". However, the results do not contain any details about the variables importance for each cluster. That's why I did a classification using "HCPC" which give all these details.

My question is the following : Why do the results of the two hierarchical classifications are different ? (for example, in the first classification, there are 881 individuals in the first cluster, where as there are 679 individuals in the first cluster for the second classification)

dtf.year <- read.table(file="studies/dtf.year.txt", sep="\t", header=T)

#hclust
library(ade4)
year.afdm <- dudi.mix(dtf.year,scannf=F,nf=2)
dist.year <- dist(year.afdm$li[,1:2],method="euclidian")
dist.year <- dist.year^2
year.tree <- hclust(dist.year,method="ward.D") #I also tried ward.D2
year.clusters <- cutree(year.tree, k=3)
table(year.clusters)

>   1   2   3 
881 225 535 

#HCPC
library(FactoMineR)
year.afdm <- FAMD(dtf.year, ncp=2)
year.tree2 <- HCPC(year.afdm , method="ward",order=FALSE)
table(year.tree2$data.clust$clust)

>   1   2   3 
679 267 695

Any help is welcome!

Best wishes, Tang'

zx8754
  • 52,746
  • 12
  • 114
  • 209
tang
  • 21
  • 1
  • 1
    Sorry, I just found the solution. The HCPC's function does a k-means consolidation. So if we specify the parameter consol=F, this consolidation is not applied and we find the same results as the hclust's function HCPC(annee2.afdm, consol=F, method="ward",order=F) instead of : HCPC(year.afdm , method="ward",order=FALSE) – tang Jun 01 '15 at 11:43

0 Answers0