using PySpark pipeline models at inference time without a Spark context

Asked Nov 27 '19 at 12:43

Active Nov 27 '19 at 13:21

Viewed 355 times

The workflow :

to preprocess our raw data we use PySpark. We need to use Spark because of the size of the data.
the PySpark preprocessing job uses a pipeline model that allows you to export your preprocessing logic to a file.
by exporting the preprocessing logic via a pipeline model, you can load the pipeline model at inference time. Like this you don't need to code you preprocessing logic twice.
at inference time, we would prefer to do the preprocessing step without a Spark context. The Spark Context is redundant at inference time, it slows down the time it takes to perform the inference.

i was looking at Mleap but this only supports the Scala language to do inference without a Spark context. Since we use PySpark it would be nice to stick to the Python language.

Question: What is a good alternative that lets you build a pipeline model in (Py)Spark at training phase and lets you reuse this pipeline model using the Python language without the need of a Spark context?

edited Nov 27 '19 at 13:21

asked Nov 27 '19 at 12:43

Vincent Claes

3,960
3
44
62

using PySpark pipeline models at inference time without a Spark context

0 Answers0