I have a dataset of 6 elements. I computed the distance matrix using Gower distance, which resulted in the following matrix:
By just looking at this matrix, I can tell that element #0 is similar to element #4 and #5 the most, so I assumed the output of the HDBSCAN would be to cluster those together, and assume the rest are outliers; however, that wasn't the case.
clusterer = hdbscan.HDBSCAN(min_cluster_size=2, min_samples=3, metric='precomputed',cluster_selection_epsilon=0.1, cluster_selection_method = 'eom').fit(distance_matrix)
Clusters Formed:
Cluster 0: {element #0, element #2}
Cluster 1: {element #4, element #5}
Outliers: {element #1, element #3}
which is a behavior I don't understand. Also, both parameters cluster_selection_epsilon
and cluster_selection_method
don't seem to have an effect on my results at all and I don't understand why.
I tried changing the parameters again to min_cluster_size=2, min_samples=1
Clusters Formed:
Cluster 0: {element #0, element #2,element #4, element #5}
Cluster 1: {element #1, element #3}
and any other change in the parameters resulted in all points classified as outliers.
Can someone please help explain this behavior, and explain why cluster_selection_epsilon
and cluster_selection_method
don't affect the clusters formed. I thought that by setting cluster_selection_epsilon
to 0.1, I'd be ensuring that the points inside a cluster would be of distance 0.1 or less apart (so that element #0 and element #2 aren't clustered together for instance)