8

I am using the following code to get the optimised parameters for randomforest using gridsearchcv.

x_train, x_test, y_train, y_test = train_test_split(X, y, random_state=0)
rfc = RandomForestClassifier(random_state=42, class_weight = 'balanced')
param_grid = { 
    'n_estimators': [200, 500],
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth' : [4,5,6,7,8],
    'criterion' :['gini', 'entropy']
}
k_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)
CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 10, scoring = 'roc_auc')
CV_rfc.fit(x_train, y_train)
print(CV_rfc.best_params_)
print(CV_rfc.best_score_)

Now, I want to apply the tuned parameters to X_test. For that I did the following,

pred = CV_rfc.decision_function(x_test)
print(roc_auc_score(y_test, pred))

However, decision_function does not seem to support randomforest as I got the following error.

AttributeError: 'RandomForestClassifier' object has no attribute 'decision_function'.

Is there any other way of doing this?

I am happy to provide more details if needed.

Venkatachalam
  • 16,288
  • 9
  • 49
  • 77
EmJ
  • 4,398
  • 9
  • 44
  • 105

3 Answers3

12

If your intention is to get a model scoring function so that the scoring can be used for auc_roc_score, then you can go for predict_proba()

y_pred_proba = CV_rfc.predict_proba(x_test)
print(roc_auc_score(y_test, y_pred_proba[:,1]))
Venkatachalam
  • 16,288
  • 9
  • 49
  • 77
  • please let me know if you know an answer for this: https://stackoverflow.com/questions/55609339/how-to-perform-feature-selection-with-gridsearchcv thank you very much :) – EmJ Apr 10 '19 at 21:57
  • How can we use this function for GMM clustering. – imhans33 Nov 14 '21 at 18:45
  • roc_auc_score is meant for classification problems. For clustering, you can refer [here](https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation) – Venkatachalam Nov 16 '21 at 10:36
  • I compared the output of `metrics.roc_curve()` using confidence scores from `decision_function()` and probability scores from `predict_proba()` in a logistic regression classifier, and they both produced the same result. – Herman Autore May 16 '22 at 21:46
2

you can use either predict() method or to get the optimized random forest model using best_estimator_

2

Your code,

pred = CV_rfc.decision_function(x_test)
print(roc_auc_score(y_test, pred))

Makes me think that you are trying to make predictions with the trained model.

If you want to get prediction labels you can do like this,

pred = CV_rfc.predict(x_test)

Then the output will be class labels like [1, 2, 1, ... ]

If you want to get class probabilities instead, you can use predict_proba like this

pred = CV_rfc.predict_proba(x_test)
Sreeram TP
  • 11,346
  • 7
  • 54
  • 108