How to decide on the clustering method for categorical data in R?

Question

I'm trying to perform a cluster analysis on mixed data (demographics variables + Likert scales from 1 to 10 preferences). I am trying to apply hierarchical clustering with the function daisy() for mixed data, but when i compute the goodness of fit - cophenetic correlation, the score is 0.60 which is not very high.

How can i improve the goodness of fit? Is hierarchical method suitable for this data? Should the Likert scale data be treated as factors or as numeric? Also, when calling - hclust(seg.dist, method="complete"), is this method suitable for my data?

I also tried Latent Class Analysis but the results are not interesting (unless I was doing it wrong)

seg.dist <- daisy(EUR_data)
as.matrix(seg.dist)
seg.hc <- hclust(seg.dist, method="complete")

to calculate the cophenetic correlation:

cor(cophenetic(seg.hc), seg.dist)

score 0 · Answer 1 · answered Oct 12 '19 at 11:19

Improve preprocessing of your data.

Some attributes will be more important than others.

Likert attributes also often cannot be treated as interval scale, because people are less likely to give a 7 than a 6 or 8 because of cultural reasons: 7 is bad luck.

Clustering will only be as good as your distance, so improve your preprocessing and distance computations!

How to decide on the clustering method for categorical data in R?

1 Answers1