Bisecting KMeans for Document Clustering

Question

I'm currently doing a research on Document Clustering. I want to run Bisecting KMeans in Java on my data set(Text Documents). Can anyone provide the code for the same. The final runs is going to be in Hadoop using MapReduce.

Thank you.

score 0 · Answer 1 · answered Feb 12 '15 at 06:58

Have you looked in Mahout or Spark MLLib to write your clustering algorithms? These are the defacto industry standards for Machine Learning on Hadoop. Both libraries have K-Means (among many others) but neither of them has a released version of Bisecting K-Means. There is a pull request open on the Spark project in Github for Hierarchical K-Means (SPARK-2429) (not sure if this is the same as Bisecting K-Means).

Another point I wanted to make is for you to consider Spark instead of MapReduce. For iterative algorithms such as K-Means Spark is much more performant.

Bisecting KMeans for Document Clustering

1 Answers1