1

I need to classify a number of data points that will arrive in time. Streaming K-Means would be fine, if I only knew how many different classes (clusters) I might find on my data points. Is there any way to use Spark MLlib 'out of the box' to run a streaming clustering algorithm, in which there is an unknown number of clusters?

  • Do you need to experiment and change the number of clusters as data continues to arrive? If so, at what point can you "freeze" the number of clusters? If not, what guidance do you give the algorithm for cluster density and cohesion? – Prune Apr 29 '16 at 23:10

0 Answers0