0

I'm using BayesSearchCV from scikit-optimize to train a model on a fairly imbalanced dataset. From what I'm reading precision or ROC AUC would be the best metrics for imbalanced dataset. In my code:

knn_b = BayesSearchCV(estimator=pipe, search_spaces=search_space, n_iter=40, random_state=7, scoring='roc_auc')
knn_b.fit(X_train, y_train)

The number of iterations is just a random value I chose (although I get a warning saying I already reached the best result, and there is not a way to early stop as far as I'm aware?). For the scoring parameter, I specified roc_auc, which I'm assuming it will be the primary metric to monitor for the best parameter in the results. So when I call knn_b.best_params_, I should have the parameters where the roc_auc metrics is higher. Is that correct?

My confusion is when I look at the results using knn_b.cv_results_. Shouldn't the mean_test_score be the roc_auc score because of the scoring param in the BayesSearchCV class? What I'm doing it plotting the results and seeing how each combination of params performed.

sns.relplot(
    data=knn_b.cv_results_, kind='line', x='param_classifier__n_neighbors', y='mean_test_score', 
    hue='param_scaler', col='param_classifier__p',
)

When I try to use to roc_auc_score() function on the true and predicted values, I get something completely different.

Is the mean_test_score here different? How would I be able to get the individual/mean roc_auc score of each CV/split of each iteration? Similarly for when I want to use RandomizedSearchCV or GridSearchCV.

EDIT: tldr; I want to know what's being computed exactly in mean_test_score. I thought it was roc_auc because of the scoring param, or accuracy, but it seems to be neither.

Callum Matthews
  • 95
  • 2
  • 10
  • What does "completely different" mean? How do you produce your predictions? Please provide a minimal reproducible example. – Ben Reiniger Mar 05 '22 at 22:02
  • I mean that, `roc_auc_score()` gives a value different than the best result in `mean_test_score` I plot. I calculated my predictions using `y_pred = knn_b.predict(X_test)`; this should use the best model from the search, right? I'm also assuming `roc_auc_score(y_test, y_pred)` is the score of the best result from the BayesSearchCV. Here are the steps I did: https://gist.github.com/callmws/dce84ba53ef99b6094b322cd2828c6b4 – Callum Matthews Mar 06 '22 at 08:32
  • You should paste that code into the question. – Ben Reiniger Mar 06 '22 at 17:13

1 Answers1

1

mean_test_score is the AUROC, because of your scoring parameter, yes.

Your main problem is that the ROC curve (and the area under it) require the probability predictions (or other continuous score), not the hard class predictions. Your manual calculation is thus incorrect.

You shouldn't expect exactly the same score anyway. Your second score is on the test set, and the first score is optimistically biased by the hyperparameter selection.

Ben Reiniger
  • 10,517
  • 3
  • 16
  • 29
  • 1
    Oh, that makes sense. If I'm understanding this, and please bear with me, if I calculate the AUC score using `auc_score = roc_auc_score(y_test, pred_probas[:, 1])` where `pred_probas = knn_b.predict_proba(X_test)`, this should give me the test AUC score; while the one I've used previously should be the optimistically biased one (so train set?). – Callum Matthews Mar 06 '22 at 18:58
  • a quick follow-up to this: if my dataset is imbalanced (fewer observations where customers churn (1) than not churn (0)). What would be the best scoring criteria to monitor here? I've seen several reasons why this is better than that. I'm assuming it's either precision or F1 correct? – Callum Matthews Mar 07 '22 at 10:35