I have some data and also the pairwise distance matrix of these data points. I want to cluster them using Agglomerative clustering. I readthat in sklearn, we can have 'precomputed' as affinity and I expect it is the distance matrix. But I could not find any example which uses precomputed affinity and a custom distance matrix. Any help will be appreciated.
Asked
Active
Viewed 9,169 times
1 Answers
19
Let's call your distance matrix D
.
agg = AgglomerativeClustering(n_clusters=5, affinity='precomputed', linkage = 'average')
agg.fit_predict(D) # Returns class labels.
If you're interested in generating the entire hierarchy and producing a dendrogram, scikit-learn
's API wraps the scipy
hierarchical clustering code. Just use the scipy
code directly.

eric
- 7,142
- 12
- 72
- 138

Arya McCarthy
- 8,554
- 4
- 34
- 56
-
Thanks a lot. It helped. – B bonita Jun 30 '17 at 16:16
-
If this answered your question, I encourage you to mark as the correct answer using the checkbox beside it. This benefits you, me, and people who have the same problem further down the line. Otherwise, what can be clarified? – Arya McCarthy Mar 27 '18 at 21:36
-
I got an error using this code: precomputed was provided as affinity. Ward can only work with euclidean distances. – Jingpeng Wu Jul 27 '19 at 20:28
-
1The [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html) makes it clear that this will break: 'If linkage is “ward”, only “euclidean” is accepted.' – Arya McCarthy Jul 28 '19 at 21:34