-1

I checked unsupervised clsutering on gensim, fasttext, sklearn but did not find any documentation where I can cluster my text data using unsupervised learn without mentioning numbers of cluster to be identified

for example in sklearn KMneans clustering

km = KMeans(n_clusters=true_k, init='k-means++', max_iter=100)

Where I have to provide n_clusters.

In my case, I have text and it should be automatically identify numbers of clusters in it and cluster the text. Any reference article or link much appreciated.

Nipun Wijerathne
  • 1,839
  • 11
  • 13
user2129623
  • 2,167
  • 3
  • 35
  • 64
  • Have you gone through the [overview of clustering methods](http://scikit-learn.org/stable/modules/clustering.html#overview-of-clustering-methods) in scikit-learn? There are a few of them that do not have the number of clusters directly as parameter. – jdehesa Sep 20 '18 at 13:16

1 Answers1

1

DBSCAN is a density-based clustering method that we don't have to specify the number of clusters beforehand.

sklearn implementation : http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html

Here is a good tutorial that gives an intuitive understanding on DBSCAN: http://mccormickml.com/2016/11/08/dbscan-clustering/

I extracted following from the above tutorial, which may be useful for you.

k-means requires specifying the number of clusters, ‘k’. DBSCAN does not, but does require specifying two parameters which influence the decision of whether two nearby points should be linked into the same cluster.

These two parameters are a distance threshold, ε (epsilon), and “MinPts” (minimum number of points), to be explained.

There are other methods (follow the link given in the comments) also, However, DBSCAN is a popular choice.

Nipun Wijerathne
  • 1,839
  • 11
  • 13