1

I have been working with HDBSCAN and have a few hundreds of clusters based on my data. I am trying to select some cluster groups for further analysis. Looking for the clusters which have high inter-cluster-distance, as in more spread out and behave bit outlier-ish than the rest of the cluster. As of now, I was working with the (-1)clusters category but realized that cluster.probabilities_ of these clusters are 0. I need this value for further analysis.

My question is:

  1. What does cluster.probabilities_ score say about a cluster?
  2. And is there any way (other than just choosing the -1 cluster category) I can select some other clusters where there might be possibilities of outliers as well? As in calculating inter-cluster-distance or maybe some other way?
Jazz
  • 445
  • 2
  • 7
  • 22

1 Answers1

0
  1. cluster.probabilities_ means the probability that given data point belongs to that cluster

  2. -1 means that this data point has been labeled as noise. If you want them to be allocated Soft Clustering might be a solution

sogu
  • 2,738
  • 5
  • 31
  • 90