6

I'd like to train a model using Spark ML Lib but then be able to export the model in a platform-agnostic format. Essentially I want to decouple how models are created and consumed.

My reason for wanting this decoupling is so that I can deploy a model in other projects. E.g.:

  • Use the model to perform predictions in a separate standalone program which doesn't depend on Spark for the evaluation.
  • Use the model with existing projects such as OpenScoring and provide APIs which can make use of the model.
  • Load an existing model back into Spark for high throughput prediction.

Has anyone done something like this with Spark ML Lib?

trianta2
  • 3,952
  • 5
  • 36
  • 52
  • You can try [jpmml](https://github.com/jpmml). I have no practical knowledge using jpmml but I think that is what you need weither you are using Java or Scala. – eliasah Apr 15 '15 at 16:48
  • I was looking into JPMML but I was not seeing any clear approaches on converting the ML Lib models to JPMML out of the box – trianta2 Apr 15 '15 at 17:29
  • you have to read the documentation... – eliasah Apr 15 '15 at 17:45
  • 1
    Have you seen the following Github issue: https://github.com/apache/spark/pull/3062#discussion_r19769621 – user1808924 Apr 15 '15 at 17:59
  • 1
    @eliasah what documentation exactly are you referring to? user1808924 I have not seen that issue. That PR appears to tackle the serialization of learners but not of transformers, so it looks like I would need to fork spark and develop PMML serialization logic for each additional feature transformer (scaling, feature extraction, etc.) – trianta2 Apr 15 '15 at 18:57

1 Answers1

4

Version of Spark 1.4 now has support for this. See latest documentation. Not all models are available (see to be supported (see the JIRA issue SPARK-4587).

HTHs

user2051561
  • 838
  • 1
  • 7
  • 21