-1

I have randomforest regressor pyspark ml model .response variable is of 9 classses.

When I predict the test data I am getting probability I need to get the classes instead.

Code used:

rf = RandomForestRegressor(featuresCol="scaled_features")
pipeline = Pipeline(stages=[featureIndexer, rf])

# Train model.  This also runs the indexer.
model = pipeline.fit(train)

# Make predictions.
predictions = model.transform(test)

evaluator = RegressionEvaluator(labelCol="label", predictionCol="prediction", metricName="rmse")
rmse = evaluator.evaluate(predictions)
desertnaut
  • 57,590
  • 26
  • 140
  • 166
Naveen Srikanth
  • 739
  • 3
  • 11
  • 23
  • 2
    You sound confused; a regressor (such as RF here) does **not** return probabilities, but simply numerical values. If your problem is a classification one, you should use the respective classifier, and not a regressor. – desertnaut Jun 24 '20 at 17:39
  • Thanks for the clarification , But in my target variable have 9 classes . I need to use regressor not classifier . However for my test classes predicted only 2 . Model is not predicting other classes – Naveen Srikanth Jun 24 '20 at 18:04
  • 1
    I'm afraid you still sound confused. You are simply in a multi-class classification setting (with 9 classes), which is classification nevertheless, and *not* regression. By definition, you cannot get probability values (let alone classes) from a regression model. – desertnaut Jun 24 '20 at 22:21
  • Thank you yes got confused as my input class has [0-9] classes . With regression fit expected predicted results to be in range of 0-9. When I saw the predicted results it was 0.1,1.3 only with 0.xxx and 1.xxx. Hence I thought it as probability. But you have clarified these are not probability values – Naveen Srikanth Jun 25 '20 at 07:03

1 Answers1

0

You are working on the classification problem. So you should use the RandomForestClassifier as the ML algorithm.

For the evaluation, you should use MulticlassClassificationEvaluator.

Danylo Baibak
  • 2,106
  • 1
  • 11
  • 18
  • I am treating my target class as continuous with 10 classes . When using RandomForest Regressor I have prediction only for 2 classes how can I make model predict for all classes. – Naveen Srikanth Jun 24 '20 at 18:07