How to get decision function in randomforest in sklearn

Question

I am using the following code to get the optimised parameters for randomforest using gridsearchcv.

x_train, x_test, y_train, y_test = train_test_split(X, y, random_state=0)
rfc = RandomForestClassifier(random_state=42, class_weight = 'balanced')
param_grid = { 
    'n_estimators': [200, 500],
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth' : [4,5,6,7,8],
    'criterion' :['gini', 'entropy']
}
k_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)
CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 10, scoring = 'roc_auc')
CV_rfc.fit(x_train, y_train)
print(CV_rfc.best_params_)
print(CV_rfc.best_score_)

Now, I want to apply the tuned parameters to X_test. For that I did the following,

pred = CV_rfc.decision_function(x_test)
print(roc_auc_score(y_test, pred))

However, decision_function does not seem to support randomforest as I got the following error.

AttributeError: 'RandomForestClassifier' object has no attribute 'decision_function'.

Is there any other way of doing this?

I am happy to provide more details if needed.

Aren't you looking for `CV_rfc.predict(x_test)`? – gmds Apr 10 '19 at 06:07 — gmds, Apr 10 '19 at 06:07

Venkatachalam · Accepted Answer · 2019-04-10T07:20:12.167

12

If your intention is to get a model scoring function so that the scoring can be used for auc_roc_score, then you can go for predict_proba()

y_pred_proba = CV_rfc.predict_proba(x_test)
print(roc_auc_score(y_test, y_pred_proba[:,1]))

edited Apr 10 '19 at 07:20

answered Apr 10 '19 at 06:16

Venkatachalam

16,288
9
49
77

please let me know if you know an answer for this: https://stackoverflow.com/questions/55609339/how-to-perform-feature-selection-with-gridsearchcv thank you very much :) – EmJ Apr 10 '19 at 21:57
How can we use this function for GMM clustering. – imhans33 Nov 14 '21 at 18:45
roc_auc_score is meant for classification problems. For clustering, you can refer [here](https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation) – Venkatachalam Nov 16 '21 at 10:36
I compared the output of `metrics.roc_curve()` using confidence scores from `decision_function()` and probability scores from `predict_proba()` in a logistic regression classifier, and they both produced the same result. – Herman Autore May 16 '22 at 21:46

score 2 · Answer 2 · answered Apr 10 '19 at 06:09

2

you can use either predict() method or to get the optimized random forest model using best_estimator_

answered Apr 10 '19 at 06:09

Kiruparan Balachandran

352
2
10

score 2 · Answer 3 · answered Apr 10 '19 at 06:18

Your code,

pred = CV_rfc.decision_function(x_test)
print(roc_auc_score(y_test, pred))

Makes me think that you are trying to make predictions with the trained model.

If you want to get prediction labels you can do like this,

pred = CV_rfc.predict(x_test)

Then the output will be class labels like [1, 2, 1, ... ]

If you want to get class probabilities instead, you can use predict_proba like this

pred = CV_rfc.predict_proba(x_test)

How to get decision function in randomforest in sklearn

3 Answers3

Linked