I use python to run the random forest on an imbalanced dataset with binary target class. I wanna change the default probabilistic threshold 0.5 to another value to raise the recall and precision. I cannot find so far any defined method or class which can be used to conduct this task. Could anyone please advice a method or did it mean I should code for it myself? Cheers
Asked
Active
Viewed 2,998 times
0
-
Which library are you running? Python doesn't have "the random forest". – Ami Tavory Dec 31 '16 at 09:25
-
@AmiTavory I use 'from sklearn.ensemble import RandomForestClassifier' – LUSAQX Dec 31 '16 at 09:26
-
1you can get the probabilities using `p = clf.predict_proba(X)` and then compute `Y = p > custom_value` – tihom Dec 31 '16 at 09:30
-
@tihom right. So no defined method can be used and I should manually code for that? – LUSAQX Dec 31 '16 at 09:32
-
@LUSAQX I am not aware of any defined method or parameter to do this out of box – tihom Dec 31 '16 at 09:48
1 Answers
1
The RandomForestClassifier of scikit-learn has no fixed threshold to assign a class to sample. As you can see in the source code of RandomForestClassifier.predict it simply returns the most likely class. Of course you can use the approach suggested by @thiom but I can hardly imagine that this will improve precision and recall.
For instance, if your chosen threshold is 0.7 and the class probabilities are 0.6 and 0.4 what class do you assign? None at all?
As an alternative, you can try to use the class_weight
option of RandomForestClassifier to put more weight on your underrepresented class.

René
- 178
- 6
-
I agree. Changing prediction probabilities is not a robust way to improve precision/recall for imbalanced classes. You will need to update your model training strategy like playing with class weights (as suggested by Rene) or changing the sampling frequencies. – tihom Dec 31 '16 at 19:07