Specify max distance in agglomerative clustering (scikit learn)

Question

When using a clustering algorithm, you always have to specify a shutoff parameter.

I am currently using Agglomerative clustering with scikit learn, and the only shutoff parameter that I can see is the number of clusters.

agg_clust = AgglomerativeClustering(n_clusters=N)
y_pred = agg_clust.fit_predict(matrix)

But I would like to find an algorithm where you would specify the maximum distance within elements of a clusters, and not the number of clusters. Therefore the algorithm would simply agglomerate clusters until the max distance is reached.

Any suggestion ?

score 1 · Accepted Answer · answered Aug 10 '18 at 11:00

1

What you are looking for is implemented in scipy.cluster.hierarchy, see here.

So here is how you can do it:

from scipy.cluster.hierarchy import linkage, fcluster
y_pred = fcluster(linkage(matrix), t, criterion='distance')  

# or more direct way
from scipy.cluster.hierarchy import fclusterdata
y_pred = fclusterdata(matrix, t, criterion='distance')

answered Aug 10 '18 at 11:00

Ugurite

503
1
4
11

What is the variable `t`? – Worm Oct 02 '19 at 14:06
`t` in this example, is a scalar that specifies the maximum distance allowed between two elements of the same cluster. More info [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.fcluster.html#scipy.cluster.hierarchy.fcluster). – Ugurite Oct 02 '19 at 22:19

Specify max distance in agglomerative clustering (scikit learn)

1 Answers1