I would like to build a Gradient boosted tree
classifier by PySpark
, for multiclass classification task. I have tried:
gb = GBTClassifier(maxIter=10)
ovr = OneVsRest(classifier=gb)
ovrModel = ovr.fit(trainingData)
gb_predictions = ovrModel.transform(valData)
evaluator = MulticlassClassificationEvaluator(metricName="accuracy")
gb_accuracy = evaluator.evaluate(gb_predictions)
When I run the code above, I get this error:
numClasses = int(dataset.agg({labelCol: "max"}).head()["max("+labelCol+")"]) + 1
AssertionError: Classifier <class 'pyspark.ml.classification.GBTClassifier'> doesn't extend from HasRawPredictionCol.
this is about the ovrModel = ovr.fit(trainingData)
line, but I don't understand what is wrong with the training data.