I'm trying to create a clustering method that combines K-Means and Agglomerative Clustering.
The first step would be to apply the K-Means algorithm to group the data into 50 clusters. From the centroids and labels obtained for each cluster,
Second, I ll' display a dendrogram in order to choose the adequate number of clusters (>2).
Then I'll apply a hierarchical ascending classification algorithm (with Agglomerative Clustering) from the centroids obtained in step 1 with the number of clusters obtained in step 2.
Then I'll calculate the centroids for each new cluster.
Finally, I'll use the calculated centroids to consolidate these clusters by the K-Means algorithm (with the init argument of KMeans which allows to specify the centroids from which the algorithm starts).
In order to do this. I tried the following code :
# Step 1
clf = KMeans(n_clusters = 50)
clf.fit(df)
labels = clf.labels_
centroids = clf.cluster_centers_
# Step 2
Z = linkage(df, method = 'ward', metric = 'euclidean')
dendrogram(Z, labels = labels, leaf_rotation = 90., color_threshold = 0)
On the basis of the dendrogram, I found out that optimal choice for the numbers of clusters was 3
# Step 3 :
avg = AgglomerativeClustering(n_clusters = 3)
avg.fit(centroids)
labels_1 = avg.labels_
Z = linkage(centroids, method = 'ward', metric = 'euclidean')
dendrogram(Z, labels = labels_1, leaf_rotation = 90., color_threshold = 0)
But after this I'm lost, I don't know how to calculate the new centroids and how to implement the new KMeans code
What do you think of my steps, did I do something wrong? What to do to make this combination work, thanks !