-1

I have a process that performs anomaly detection using Isolation Forest (pysparkling) on a pyspark data frame. It performs multiple steps including...

    #initialising h2o
    hc = H2OContext.getOrCreate()
    h2o_df = hc.asH2OFrame(df)
    #
    # predictions and transformations stored in h2o_df
    #
    ret_df = hc.asSparkFrame(h2o_df)

The last line is failing with error: """

File predict.py, line 169, in get_predictions
preds_df = hc.asSparkFrame(ml_df_h2o)
File "/tmp/…/ deps.zip/ai/h2o/sparkling/H2OContext.py", line 175, in asSparkFrame
File "/…/dist/ deps.zip/py4j/java_gateway.py", line 1323, in call
File "/cs/cloudera/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.1004-2-1.p0.32140249/lib/spark3/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/cs/orcdev/yash/orc_anomaly_detector/backend/orc_ad_package/dist/orc_ad_glass_deps.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o160.asSparkFrame.
: ai.h2o.sparkling.backend.exceptions.RestApiCommunicationException: H2O URL responded with
Status code: 500 : Server Error
Server error: {"__meta":{"schema_version":3,"schema_name":"H2OErrorV3","schema_type":"H2OError"},"timestamp":1691575019939,"error_url":"/3/Frames/py_7_sid_9282/summary","msg":"\n\nERROR MESSAGE:\n\nDistributedException from URL: 'Index 2278544 out of bounds for length 2278544',

"""

I started facing this error after I moved to JDK 11 on my linux box. It was jdk 8 before this.

I tried recreating the new dependency zip package using JDK 11. But the process fails.

0 Answers0