I have been working with HDBSCAN
and have a few hundreds of clusters based on my data. I am trying to select some cluster groups for further analysis. Looking for the clusters which have high inter-cluster-distance, as in more spread out and behave bit outlier-ish than the rest of the cluster. As of now, I was working with the (-1
)clusters category but realized that cluster.probabilities_
of these clusters are 0
. I need this value for further analysis.
My question is:
- What does
cluster.probabilities_
score say about a cluster? - And is there any way (other than just choosing the
-1
cluster category) I can select some other clusters where there might be possibilities of outliers as well? As in calculating inter-cluster-distance or maybe some other way?