1

I'd like to extract original points that form each cluster, I know that HDBSCAN doesn't have cluster centers , so I thought in case each label corresponds to the original point at the same order, I can do the following but the results are really bad !!

  hd = hdscan.labels_
  df['s1']=np.where(hd==0 ,df['Close'] ,np.nan)

1 Answers1

0

You can use the NearestCentroid method from sklearn module to get the HDBSCAN Cluster. For my use case, I used the following function to get the cluster centers:

from sklearn.neighbors.nearest_centroid import NearestCentroid

def get_cluster_centers(self, clustering_df= None, metrics_by_col='CLUSTER_NO'):
    model_cols=['CREATEDTTM','LAT_GEOCODER','LNG_GEOCODER']
    clf = NearestCentroid()
    clf.fit(clustering_df[model_cols],clustering_df[metrics_by_col])
    centers_df=pd.DataFrame(clf.centroids_,columns=model_cols)
    centers_df['classes']=clf.classes_
    centers_df.set_index('classes', inplace= True)
    return centers_df

The clustering_df is the scaled dataframe. You always should use normalized data for clustering process.

Gaurav Sitaula
  • 206
  • 1
  • 7