pyspark random forest regressor predict multiclass

Question

I have randomforest regressor pyspark ml model .response variable is of 9 classses.

When I predict the test data I am getting probability I need to get the classes instead.

Code used:

rf = RandomForestRegressor(featuresCol="scaled_features")
pipeline = Pipeline(stages=[featureIndexer, rf])

# Train model.  This also runs the indexer.
model = pipeline.fit(train)

# Make predictions.
predictions = model.transform(test)

evaluator = RegressionEvaluator(labelCol="label", predictionCol="prediction", metricName="rmse")
rmse = evaluator.evaluate(predictions)

You sound confused; a regressor (such as RF here) does **not** return probabilities, but simply numerical values. If your problem is a classification one, you should use the respective classifier, and not a regressor. — desertnaut, Jun 24 '20 at 17:39
Thanks for the clarification , But in my target variable have 9 classes . I need to use regressor not classifier . However for my test classes predicted only 2 . Model is not predicting other classes — Naveen Srikanth, Jun 24 '20 at 18:04
I'm afraid you still sound confused. You are simply in a multi-class classification setting (with 9 classes), which is classification nevertheless, and *not* regression. By definition, you cannot get probability values (let alone classes) from a regression model. — desertnaut, Jun 24 '20 at 22:21
Thank you yes got confused as my input class has [0-9] classes . With regression fit expected predicted results to be in range of 0-9. When I saw the predicted results it was 0.1,1.3 only with 0.xxx and 1.xxx. Hence I thought it as probability. But you have clarified these are not probability values — Naveen Srikanth, Jun 25 '20 at 07:03

score 0 · Answer 1 · answered Jun 24 '20 at 18:00

0

You are working on the classification problem. So you should use the RandomForestClassifier as the ML algorithm.

For the evaluation, you should use MulticlassClassificationEvaluator.

answered Jun 24 '20 at 18:00

Danylo Baibak

2,106
1
11
18

I am treating my target class as continuous with 10 classes . When using RandomForest Regressor I have prediction only for 2 classes how can I make model predict for all classes. – Naveen Srikanth Jun 24 '20 at 18:07

pyspark random forest regressor predict multiclass

1 Answers1