12

I'm trying to find the parameters for my SVM, which give me the best AUC. But i can't find any scoring function for AUC in sklearn. Does someone have an idea? Here is my Code:

    parameters = {"C":[0.1, 1, 10, 100, 1000], "gamma":[0.1, 0.01, 0.001, 0.0001, 0.00001]}
    clf = SVC(kernel = "rbf")
    clf = GridSearchCV(clf, parameters, scoring = ???)
    svr.fit(features_train , labels_train)
    print svr.best_params_

So what can i use for ??? to get the best parameters for an high AUC score?

julianspaeth
  • 205
  • 1
  • 3
  • 9

4 Answers4

32

You can simply use:

clf = GridSearchCV(clf, parameters, scoring='roc_auc')
piman314
  • 5,285
  • 23
  • 35
  • 2
    so if i print out svr.best_score_ its the auc? because i tried to compute it like this: `#ROC false_positive_rate, true_positive_rate, thresholds = roc_curve(labels_test, labels_predicted) roc_auc = auc(false_positive_rate, true_positive_rate) print roc_auc` but it shows me a lower auc than the best score – julianspaeth Jun 08 '16 at 10:23
  • 1
    The best score corresponds to the best average `roc_auc` over each of the folds in the training process. One would expect to see a lower score on a test set. – piman314 Jun 08 '16 at 13:00
11

You can make any scorer by your own:

from sklearn.metrics import make_scorer
from sklearn.metrics import roc_curve, auc

# define scoring function 
 def custom_auc(ground_truth, predictions):
     # I need only one column of predictions["0" and "1"]. You can get an error here
     # while trying to return both columns at once
     fpr, tpr, _ = roc_curve(ground_truth, predictions[:, 1], pos_label=1)    
     return auc(fpr, tpr)

# to be standart sklearn's scorer        
 my_auc = make_scorer(custom_auc, greater_is_better=True, needs_proba=True)

 pipeline = Pipeline(
                [("transformer", TruncatedSVD(n_components=70)),
                ("classifier", xgb.XGBClassifier(scale_pos_weight=1.0, learning_rate=0.1, 
                                max_depth=5, n_estimators=50, min_child_weight=5))])

 parameters_grid = {'transformer__n_components': [60, 40, 20] }

 grid_cv = GridSearchCV(pipeline, parameters_grid, scoring = my_auc, n_jobs=-1,
                                                        cv = StratifiedShuffleSplit(n_splits=5,test_size=0.3,random_state = 0))
 grid_cv.fit(X, y)

For more information, please check out here: sklearn make_scorer

Artem Zaika
  • 1,130
  • 13
  • 13
6

use below code which will give you all the list of parameter

import sklearn

sklearn.metrics.SCORERS.keys()

Select appropriate parameter that you want to use

In your case below code will work

clf = GridSearchCV(clf, parameters, scoring = 'roc_auc')
Sapan Soni
  • 59
  • 1
  • 4
3

I haven't tried this but I believe you want to use the sklearn.metrics.roc_auc_score.

The problem is that it's not a model scorer, so you need to build one. Something like:

from sklearn.metrics import roc_auc_score

def score_auc(estimator, X, y):
    y_score = estimator.predict_proba(X)  # You could also use the binary predict, but probabilities should give you a more realistic score.
    return roc_auc_score(y, y_score)

and use this function as scoring parameter in the GridSearch.

pekapa
  • 881
  • 1
  • 11
  • 25
  • thanks, i like your idea but if i do this: `svr = GridSearchCV(svr, parameters, scoring = score_auc(svr, features_train, labels_train))`it causes: AttributeError: predict_proba is not available when probability=False. If i set it on true another error shows up. – julianspaeth Jun 08 '16 at 10:46
  • 1
    just do a `svr = GridSearchCV(svr, parameters, scoring=score_auc)`, you shouldn't call the function, just pass it to the search. If the `predict_proba` is giving you problems, just score with the regular `predict`. – pekapa Jun 08 '16 at 14:17
  • It feels like this will pass "score_auc" the train data - what if we want to score it on the cross validation data as we go along? – Mohamad Zeina Jun 06 '18 at 01:44
  • For some SVM models, you need to explicitly set the hyperparameter "probability=True" when initializing them, in order to get probabilistic predictions. – GrimSqueaker Aug 05 '21 at 09:14