1

I'm trying to create a clustering method that combines K-Means and Agglomerative Clustering.

The first step would be to apply the K-Means algorithm to group the data into 50 clusters. From the centroids and labels obtained for each cluster,

Second, I ll' display a dendrogram in order to choose the adequate number of clusters (>2).

Then I'll apply a hierarchical ascending classification algorithm (with Agglomerative Clustering) from the centroids obtained in step 1 with the number of clusters obtained in step 2.

Then I'll calculate the centroids for each new cluster.

Finally, I'll use the calculated centroids to consolidate these clusters by the K-Means algorithm (with the init argument of KMeans which allows to specify the centroids from which the algorithm starts).

In order to do this. I tried the following code :

# Step 1
clf = KMeans(n_clusters = 50)
clf.fit(df)
labels = clf.labels_
centroids = clf.cluster_centers_
    
# Step 2
Z = linkage(df, method = 'ward', metric = 'euclidean')
dendrogram(Z, labels = labels, leaf_rotation = 90., color_threshold = 0)
     

On the basis of the dendrogram, I found out that optimal choice for the numbers of clusters was 3

# Step 3 :
avg = AgglomerativeClustering(n_clusters = 3)
avg.fit(centroids)
labels_1 = avg.labels_
Z = linkage(centroids, method = 'ward', metric = 'euclidean')
        dendrogram(Z, labels = labels_1, leaf_rotation = 90., color_threshold = 0)

But after this I'm lost, I don't know how to calculate the new centroids and how to implement the new KMeans code

What do you think of my steps, did I do something wrong? What to do to make this combination work, thanks !

0 Answers0