0

I use python to run the random forest on an imbalanced dataset with binary target class. I wanna change the default probabilistic threshold 0.5 to another value to raise the recall and precision. I cannot find so far any defined method or class which can be used to conduct this task. Could anyone please advice a method or did it mean I should code for it myself? Cheers

LUSAQX
  • 377
  • 2
  • 6
  • 19

1 Answers1

1

The RandomForestClassifier of scikit-learn has no fixed threshold to assign a class to sample. As you can see in the source code of RandomForestClassifier.predict it simply returns the most likely class. Of course you can use the approach suggested by @thiom but I can hardly imagine that this will improve precision and recall.

For instance, if your chosen threshold is 0.7 and the class probabilities are 0.6 and 0.4 what class do you assign? None at all?

As an alternative, you can try to use the class_weight option of RandomForestClassifier to put more weight on your underrepresented class.

René
  • 178
  • 6
  • I agree. Changing prediction probabilities is not a robust way to improve precision/recall for imbalanced classes. You will need to update your model training strategy like playing with class weights (as suggested by Rene) or changing the sampling frequencies. – tihom Dec 31 '16 at 19:07