1

I'm using dbscan from sklearn and HDBSCAN to cluster some documents.

vectorizer = TfidfVectorizer(stop_words=mystopwords)
X = vectorizer.fit_transform(y)
dbscan = DBSCAN(eps=0.75, min_samples = 9)
clusters = dbscan.fit_predict(X)

Now how can I get the top terms in each cluster? When using kmeans we do something like below :

order_centroids = kmeans_model.cluster_centers_.argsort()[:, ::-1]
for i in range(true_k):
  print("Cluster %d:" % i),
  for ind in order_centroids[i, :true_k]:
      print(' %s' % terms[ind])

But in dbscan and hdbscan we don't have centroids. How can we find the top terms in clusters of dbscan or hdbscan?

0 Answers0