Is it possible to save GBTClassifier in Spark 1.6?

Question

I have trained a GBTClassifier in Spark 1.6 with the Pipeline abstraction and I am kind of confused on how to save it.

If I do:

GBTClassificationModel gbt = trainClassifierGBT(data);
Model Accuracy = 0.8306451612903226
Test Error = 0.16935483870967738
GradientBoostedTreesModel oldGBT = gbt.toOld();
oldGBT.save(jsc.sc(), "data/gbtModel");

I get:

java.lang.NullPointerException

If I do:

PipelineModel pipeModel = pipeline.fit(training);
pipeline.save("data/gbtModel");

I get:

Exception in thread "main" java.lang.UnsupportedOperationException: Pipeline write will fail on this Pipeline because it contains a stage which does not implement Writable.

I will test this solution but wonder if it can be solved another way. Spark ML Pipeline api save not working

score 2 · Answer 1 · answered Mar 01 '16 at 10:47

2

As for now (Spark 1.6.0 / 2.0.0 SNAPSHOT) it is not possible because GBTClassificationModel is not MLWritable and toOld method you try to use is private in ML

If you want to save your models you'll have to use MLlib model directly which is savable:

final GradientBoostedTreesModel model = ...;
model.save(jsc.sc(), "some-path");

answered Mar 01 '16 at 10:47

zero323

322,348
103
959
935

Thanks. But honestly what good is it for it you can't save you trained models? No pun against you but this is really crazy. – Abdul Merzoug Mar 01 '16 at 12:47
Well, to be fair this is still an `@experimental` feature. – zero323 Mar 01 '16 at 13:07
You can still serialize the pipeline / model even though the model is not Writable, it requires a little more work though. – Ulysse Mizrahi Jul 06 '16 at 07:59
@UlysseMizrahi Not in a general case. You want be able to do it with distributed models. – zero323 Jul 06 '16 at 11:03

Is it possible to save GBTClassifier in Spark 1.6?

1 Answers1