How to implement Mini Batch Kmeans using apache spark MLlib?

Question

I have implemented Kmeans using spark. But as my data is huge and feature count is very big I want to implement mini batch kmeans using Apache spark MLlib. Is there any example or document on how to implement it?

score 0 · Answer 1 · answered Aug 17 '17 at 05:30

The paper below doesn't cover apache spark MLlib, but it does walk through minibatch kmeans:

Sculley, David. “Web-Scale K-Means Clustering.” In Proceedings of the 19th International Conference on World Wide Web, 1177–1178. ACM, 2010. http://dl.acm.org/citation.cfm?id=1772862

How to implement Mini Batch Kmeans using apache spark MLlib?

1 Answers1