Retrieving members of a cluster with HDBSCAN

Question

So I have some string data that I do some manipulations to and then create a cluster with using HDBSCAN:

textData = train['eudexHash'].apply(lambda x: str(x))
clusterer = hdbscan.HDBSCAN(min_cluster_size=5,
                            gen_min_span_tree=True,
                            prediction_data=True).fit(textData.values.reshape(-1,1))

Now, when I call the cluster to predict using approximate_predict, I get these results:

>>>> hdbscan.approximate_predict(clusterer, testCase)
(array([113]), array([1.]))

Sweet, looks like it's predicting new cases, so it thinks that the new string value corresponds to the label [113]. Now, how do I find what other members are within that label/bucket/cluster?

Cheers!

score 1 · Accepted Answer · answered Nov 19 '19 at 16:28

1

If you want to find out which of your training data is part of label 113, then you can just do

textdata_with_label_113 = textData[clusterer.labels_ == 113]

answered Nov 19 '19 at 16:28

user2653663

2,818
1
18
22

Hey thanks a lot, I didn't think that the indexing would be like '=='. Really I was expecting another call after clusterer.labels_.something to get all the members under a label! Thanks bud! – DavimusPrime Nov 19 '19 at 17:21

Retrieving members of a cluster with HDBSCAN

1 Answers1