Not able to save pyspark iforest model using pyspark

Question

Using iforest as described here: https://github.com/titicaca/spark-iforest But model.save() is throwing exception:

Exception: scala.NotImplementedError: The default jsonEncode only supports string, vector and matrix. org.apache.spark.ml.param.Param must override jsonEncode for java.lang.Double.

Followed the code snippet mentioned under "Python API" section on mentioned git page.

from pyspark.ml.feature import VectorAssembler
import os
import tempfile
from pyspark_iforest.ml.iforest import *

col_1:integer
col_2:integer
col_3:integer

assembler = VectorAssembler(inputCols=in_cols, outputCol="features")
featurized = assembler.transform(df)

iforest = IForest(contamination=0.5, maxDepth=2)
model=iforest.fit(df)

model.save("model_path")

model.save() should be able to save model files.

Below is the output dataframe I'm getting after executing model.transform(df):

col_1:integer
col_2:integer
col_3:integer
features:udt
anomalyScore:double
prediction:double

I have similar problem when using the Scala version.I created the issue:https://github.com/titicaca/spark-iforest/issues/15 — florins, Jul 01 '19 at 14:10

score 1 · Answer 1 · answered Jul 02 '19 at 09:58

1

I have just fixed this issue. It was caused by an incorrect param type. You can checkout the latest codes in the master branch, and try it again.

answered Jul 02 '19 at 09:58

F.Z.Yang

66
3

Thanks Yang. I'll try with new source. – Sandie Jul 03 '19 at 11:48
Hey Yang. The new code is working fine. Thanks a lot for addressing this. – Sandie Jul 10 '19 at 08:49

Not able to save pyspark iforest model using pyspark

1 Answers1