0

So I have some string data that I do some manipulations to and then create a cluster with using HDBSCAN:

textData = train['eudexHash'].apply(lambda x: str(x))
clusterer = hdbscan.HDBSCAN(min_cluster_size=5,
                            gen_min_span_tree=True,
                            prediction_data=True).fit(textData.values.reshape(-1,1))

Now, when I call the cluster to predict using approximate_predict, I get these results:

>>>> hdbscan.approximate_predict(clusterer, testCase)
(array([113]), array([1.]))

Sweet, looks like it's predicting new cases, so it thinks that the new string value corresponds to the label [113]. Now, how do I find what other members are within that label/bucket/cluster?

Cheers!

DavimusPrime
  • 368
  • 4
  • 17

1 Answers1

1

If you want to find out which of your training data is part of label 113, then you can just do

textdata_with_label_113 = textData[clusterer.labels_ == 113]
user2653663
  • 2,818
  • 1
  • 18
  • 22
  • Hey thanks a lot, I didn't think that the indexing would be like '=='. Really I was expecting another call after clusterer.labels_.something to get all the members under a label! Thanks bud! – DavimusPrime Nov 19 '19 at 17:21