2

I would like to build a Gradient boosted tree classifier by PySpark, for multiclass classification task. I have tried:

gb = GBTClassifier(maxIter=10)
ovr = OneVsRest(classifier=gb)
ovrModel = ovr.fit(trainingData)
gb_predictions = ovrModel.transform(valData)
evaluator = MulticlassClassificationEvaluator(metricName="accuracy")
gb_accuracy = evaluator.evaluate(gb_predictions)

When I run the code above, I get this error:

numClasses = int(dataset.agg({labelCol: "max"}).head()["max("+labelCol+")"]) + 1
AssertionError: Classifier <class 'pyspark.ml.classification.GBTClassifier'> doesn't extend from HasRawPredictionCol.

this is about the ovrModel = ovr.fit(trainingData) line, but I don't understand what is wrong with the training data.

Simone
  • 4,800
  • 12
  • 30
  • 46

0 Answers0