0

I am trying to migrate a model from Hadoop to GCP. Model MOJO will not be retrained. I am running the model in Dataproc using Airflow spark submit. Source data format matches with Hadoop source and everything. While running the model, I am getting this error:

Caused by: hex.genmodel.easy.exception.PredictUnknownCategoricalLevelException: Unknown categorical level (my_column,Y)

This column has the same values as we have on Hadoop, and in there everything works fine. Model was created on H20 version 3.30.0.4 and MOJO version is 1.4.

While running the dataproc cluster I am using "PIP_PACKAGES": "h2o_pysparkling_3.1"

Not sure what the issue is? Please help.

trougc
  • 329
  • 3
  • 14

1 Answers1

0

can you please try enabling convertUnknownCategoricalLevelsToNa?

Here is the related documentation: https://s3.amazonaws.com/h2o-release/sparkling-water/spark-3.1/3.42.0.2-1-3.1/doc/deployment/load_mojo.html#customizing-the-mojo-settings

krasinski
  • 69
  • 2