Error when importing Sparkling Water (H2O) pipeline in Apache Spark: py4j.protocol.Py4JError

Question

I recently created a PySpark pipeline using Sparkling Water's AutoML in the last stage (very similar to https://github.com/h2oai/sparkling-water/blob/master/py/examples/pipelines/ham_or_spam_multi_algo.py), but when I load my model from a file I get this error:

Ex:

model = loaded_pipeline.fit(data)
model.write().overwrite().save("examples/build/model")
loaded_model = PipelineModel.load("examples/build/model")


py4j.protocol.Py4JError: ai.h2o.sparkling.ml.models.H2OMOJOModel.H2OSupervisedMOJOModel does not exist in the JVM

I have the current packages/versions: H2O (3.28.0.3), h2o-pysparkling-2-4 (3.28.0.3-1), PySpark (2.4.3), Py4j (0.10.7). I only got this error when I updated H2O/Sparkling Water to the 3.28 version. Can it be related to the definition of some environment variable or package version?

score 0 · Answer 1 · answered Feb 16 '20 at 23:49

0

Please run from pysparkling import * at the beggining of the code. This call ensures that we add Sparkling Water dependencies to the Spark app.

answered Feb 16 '20 at 23:49

Jakub Háva

228
1
5

I had that line at the beginning of the code. After some debugging, I realized that there were some transformers in the pipeline (not from the pysparkling package) that were causing the error. I remove them and the import works just fine. Thank you anyway :) – luis_ferreira223 Feb 17 '20 at 16:54

Error when importing Sparkling Water (H2O) pipeline in Apache Spark: py4j.protocol.Py4JError

1 Answers1