I'm trying to conduct a hierarchical agglomerative cluster analysis in R
by using the Weighted Cluster
package. Before doing so, I calculated the distances between state sequences by leveraging the TraMineR
package (see pp. 4-6 here).
Following the vignette hyperlinked above, I fed my distance matrix into hclust
while adding a vector of weights as follows (datadist is the distance matrix; dataframe is my data frame featuring time series data; and weight is an all-waves longitudinal survey weight):
Cluster <- hclust(as.dist(datadist), method = "ward", members = dataframe$weight)
Then, after arriving at a specific cluster solution (four subgroups), I used the cutree
function to determine the relative frequency of each cluster and assign cases:
subgroups <- cutree(Cluster, k = 4)
However, I somehow generated more than four groups after executing the code above (over 30, in fact). When I removed the vector of weights, I was able to produce frequencies for four clusters, but unweighted results are sub-optimal.
If anyone out there can help me understand what's going on (and how I can address or treat the problem), it would be greatly appreciated.