lets say, I have this type of Hierarchical clustering as below diagram. To get the clustering labels, I need to define proper threshold distance. For example, If I put the threshold at 0.32, I probably would get 3 clusters and if I set around 3.5, I would get 2 clusters from this below diagram.
Instead of using threshold and use some fixed distance, I would like to get the clustering label based on their merging orders.
I would like to define the clustering based on their merging; like first merging, second merging, etc.
For example, here I would like to get clustering labels, when they do at least first merge and that would be 3 clusters;
cluster1: p1
cluster2: p3 and p4
cluster3: p2 and p5.
If I set here, find the clustering when there is at least second merging happens. In this case, I would have 2 clusters such as:
cluster1: p1
cluster2 = p3, p4, p2 and p5.
Does scipy
has builtin method to extract this kind of information. If not, is there any way that I can extract this type of information from hierarchical clusteri
ng ? Any suggestions would be great.
Example cases:
The idea is that, I don't want to hardcode any threshold limit to define the number of clusters but rather find the clustering based on their merging order. For example, if there is p1, p2 and p3 and at one condition p1 and p2 falls in same cluster at 0.32 and another case, more data is added for p1, p2 and p3 and now they may fall in same clusters but the distance of merging of their clusters may have changed. In such, p1 and p2 are still in same cluster. So, here the distance threshold of defining clusters is irrelevant