Weighted Cluster Analysis in R — generating more clusters than requested with hclust

Question

I'm trying to conduct a hierarchical agglomerative cluster analysis in R by using the Weighted Cluster package. Before doing so, I calculated the distances between state sequences by leveraging the TraMineR package (see pp. 4-6 here).

Following the vignette hyperlinked above, I fed my distance matrix into hclust while adding a vector of weights as follows (datadist is the distance matrix; dataframe is my data frame featuring time series data; and weight is an all-waves longitudinal survey weight):

 Cluster <- hclust(as.dist(datadist), method = "ward", members = dataframe$weight)

Then, after arriving at a specific cluster solution (four subgroups), I used the cutree function to determine the relative frequency of each cluster and assign cases:

 subgroups <- cutree(Cluster, k = 4)

However, I somehow generated more than four groups after executing the code above (over 30, in fact). When I removed the vector of weights, I was able to produce frequencies for four clusters, but unweighted results are sub-optimal.

If anyone out there can help me understand what's going on (and how I can address or treat the problem), it would be greatly appreciated.

Hey, I ran the example in the vignette and I don't encounter this problem. Can you share dataframe with us? — StupidWolf, Nov 06 '19 at 21:47
Hi @StupidWolf, thank you for looking into my problem! Unfortunately, I cannot share the data frame with you (the data is restricted/not publicly-accessible). I was also able to reproduce the results from the vignette, which is what makes my current predicament even more frustrating! If you have any wild theories about what might be happening, I'm all ears. I guess what I'm trying to figure out is why cutree won't generate the results I'm asking for (if that makes sense). Thanks again! — J_Hol, Nov 06 '19 at 22:25
I suggest two things, 1. replace your members = dataframe$weight with members = runif(ncol(datadist)), if cutree still works on this, it means it's not the weights — StupidWolf, Nov 06 '19 at 22:32
then do 2., plot the tree out, plot(Cluster), maybe you cannot even cut at 4. — StupidWolf, Nov 06 '19 at 22:33

Weighted Cluster Analysis in R — generating more clusters than requested with hclust

0 Answers0