0

I have trained a GBTClassifier in Spark 1.6 with the Pipeline abstraction and I am kind of confused on how to save it.

If I do:

GBTClassificationModel gbt = trainClassifierGBT(data);
Model Accuracy = 0.8306451612903226
Test Error = 0.16935483870967738
GradientBoostedTreesModel oldGBT = gbt.toOld();
oldGBT.save(jsc.sc(), "data/gbtModel");

I get:

java.lang.NullPointerException

If I do:

PipelineModel pipeModel = pipeline.fit(training);
pipeline.save("data/gbtModel");

I get:

Exception in thread "main" java.lang.UnsupportedOperationException: Pipeline write will fail on this Pipeline because it contains a stage which does not implement Writable. 

I will test this solution but wonder if it can be solved another way. Spark ML Pipeline api save not working

Community
  • 1
  • 1

1 Answers1

2

As for now (Spark 1.6.0 / 2.0.0 SNAPSHOT) it is not possible because GBTClassificationModel is not MLWritable and toOld method you try to use is private in ML

If you want to save your models you'll have to use MLlib model directly which is savable:

final GradientBoostedTreesModel model = ...;
model.save(jsc.sc(), "some-path");
zero323
  • 322,348
  • 103
  • 959
  • 935