1

The Spark ml library proudly presents it's capability of model selection. I thought it fits my use case:

  1. In bigdata world: Train on many many labeled data points, do clever model selection by tuning parameters etc. and save the best model to disk.
  2. Outside bigdata world: Load the model from disk and run a webservice (or something) that uses that model to label unlabelled datapoints.

Looking at the somewhat older mllib library. It seems to understand my use case, there's a function predict(testData: Vector): Double that doesn't seem to require a spark context to be running.

But then it turns out that the models from the more recent (and recommended) ml library has no transform() function that transforms one single label. All functions requires an RDD of something which requires a spark context. And I don't want my web server to have to be run from a spark context.

What must I know about spark and machine learning? Is the functionality I want simply not implemented yet or is it infeasible to use spark's machine library like this to classify datapoints one by one.

Tarrasch
  • 10,199
  • 6
  • 41
  • 57

0 Answers0