Using iforest
as described here: https://github.com/titicaca/spark-iforest
But model.save()
is throwing exception:
Exception: scala.NotImplementedError: The default jsonEncode only supports string, vector and matrix. org.apache.spark.ml.param.Param must override jsonEncode for java.lang.Double.
Followed the code snippet mentioned under "Python API" section on mentioned git page.
from pyspark.ml.feature import VectorAssembler
import os
import tempfile
from pyspark_iforest.ml.iforest import *
col_1:integer
col_2:integer
col_3:integer
assembler = VectorAssembler(inputCols=in_cols, outputCol="features")
featurized = assembler.transform(df)
iforest = IForest(contamination=0.5, maxDepth=2)
model=iforest.fit(df)
model.save("model_path")
model.save()
should be able to save model files.
Below is the output dataframe I'm getting after executing model.transform(df)
:
col_1:integer
col_2:integer
col_3:integer
features:udt
anomalyScore:double
prediction:double