How to set up my own probabilistic threshold in random forest?

Question

I use python to run the random forest on an imbalanced dataset with binary target class. I wanna change the default probabilistic threshold 0.5 to another value to raise the recall and precision. I cannot find so far any defined method or class which can be used to conduct this task. Could anyone please advice a method or did it mean I should code for it myself? Cheers

Which library are you running? Python doesn't have "the random forest". — Ami Tavory, Dec 31 '16 at 09:25
@AmiTavory I use 'from sklearn.ensemble import RandomForestClassifier' — LUSAQX, Dec 31 '16 at 09:26
you can get the probabilities using `p = clf.predict_proba(X)` and then compute `Y = p > custom_value` — tihom, Dec 31 '16 at 09:30
@tihom right. So no defined method can be used and I should manually code for that? — LUSAQX, Dec 31 '16 at 09:32
@LUSAQX I am not aware of any defined method or parameter to do this out of box — tihom, Dec 31 '16 at 09:48

score 1 · Answer 1 · answered Dec 31 '16 at 10:21

The RandomForestClassifier of scikit-learn has no fixed threshold to assign a class to sample. As you can see in the source code of RandomForestClassifier.predict it simply returns the most likely class. Of course you can use the approach suggested by @thiom but I can hardly imagine that this will improve precision and recall.

For instance, if your chosen threshold is 0.7 and the class probabilities are 0.6 and 0.4 what class do you assign? None at all?

As an alternative, you can try to use the class_weight option of RandomForestClassifier to put more weight on your underrepresented class.

I agree. Changing prediction probabilities is not a robust way to improve precision/recall for imbalanced classes. You will need to update your model training strategy like playing with class weights (as suggested by Rene) or changing the sampling frequencies. — tihom, Dec 31 '16 at 19:07

How to set up my own probabilistic threshold in random forest?

1 Answers1