0

I want to perform a random search, in classification problem, where the scoring method will be chosen as AUC instead of accuracy score. Have a look at my code for reproducibility:

# Define imports and create data 
import numpy as np
from sklearn.ensemble import RandomForestClassifier 
from sklearn.model_selection import RandomizedSearchCV

x = np.random.normal(0, 1, 100)
y = np.random.binomial(0, 1, 100)


### Let's define parameter grid

rf = RandomForestClassifier(random_state=0)

n_estimators = [int(x) for x in np.linspace(start=200, stop=2000, num=4)]
min_samples_split = [2, 5, 10]
param_grid = {'n_estimators': n_estimators,
               'min_samples_split': min_samples_split}


# Define model
clf = RandomizedSearchCV(rf, 
                         param_grid, 
                         random_state=0, 
                         n_iter=3, 
                         cv=5).fit(x.reshape(-1, 1), y)

And now, according to documentation of function RandomizedSearchCV I can pass another argument scoring which will choose metric to evaluate the model. I tried to pass scoring = auc, but I got an error that there is no such metric. Do you know what I have to do to have AUC instead of accuracy?

Danylo Baibak
  • 2,106
  • 1
  • 11
  • 18
John
  • 1,849
  • 2
  • 13
  • 23

2 Answers2

1

According to documentation of function RandomizedSearchCV scoring can be a string or a callable. Here you can find all possible string values for the score parameter. You can also try to set score as a callable auc.

Danylo Baibak
  • 2,106
  • 1
  • 11
  • 18
1

As explained by Danylo and this answer you can specify the search optimal function to be the ROC-AUC, so as to pick the parameter value maximizing it:

clf = RandomizedSearchCV(rf, 
                         param_grid, 
                         random_state=0, 
                         n_iter=3, 
                         cv=5,
                         scoring='roc_auc').fit(x.reshape(-1, 1), y)
Learning is a mess
  • 7,479
  • 7
  • 35
  • 71