0

I'm trying to perform a cluster analysis on mixed data (demographics variables + Likert scales from 1 to 10 preferences). I am trying to apply hierarchical clustering with the function daisy() for mixed data, but when i compute the goodness of fit - cophenetic correlation, the score is 0.60 which is not very high.

How can i improve the goodness of fit? Is hierarchical method suitable for this data? Should the Likert scale data be treated as factors or as numeric? Also, when calling - hclust(seg.dist, method="complete"), is this method suitable for my data?

I also tried Latent Class Analysis but the results are not interesting (unless I was doing it wrong)

seg.dist <- daisy(EUR_data)
as.matrix(seg.dist)
seg.hc <- hclust(seg.dist, method="complete")

to calculate the cophenetic correlation:

cor(cophenetic(seg.hc), seg.dist)

Michcio
  • 2,708
  • 19
  • 26
Irau
  • 1
  • 1

1 Answers1

0

Improve preprocessing of your data.

Some attributes will be more important than others.

Likert attributes also often cannot be treated as interval scale, because people are less likely to give a 7 than a 6 or 8 because of cultural reasons: 7 is bad luck.

Clustering will only be as good as your distance, so improve your preprocessing and distance computations!

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194