3

I am using the RandomizedSearchCV function in sklearn with a Random Forest Classifier. To see different metrics i am using a custom scoring

from sklearn.metrics import make_scorer, roc_auc_score, recall_score, matthews_corrcoef, balanced_accuracy_score, accuracy_score

acc = make_scorer(accuracy_score)

auc_score = make_scorer(roc_auc_score)
recall = make_scorer(recall_score)
mcc = make_scorer(matthews_corrcoef)
bal_acc = make_scorer(balanced_accuracy_score)

scoring = {"roc_auc_score": auc_score, "recall": recall, "MCC" : mcc, 'Bal_acc' : bal_acc, "Accuracy": acc }

these Custom scorers are the used for the Randomized search

rf_random = RandomizedSearchCV(estimator=rf, param_distributions=random_grid, n_iter=100, cv=split, verbose=2,
                               random_state=42, n_jobs=-1, error_score=np.nan, scoring = scoring, iid = True, refit="roc_auc_score")

Now the problem is, as I am using custom splits, the AUC is throwing an exception because there is only one class label for this exact split.

I do not want to change the splits, hence is there a possibility to catch these exceptions within the RandomizedSearchCV or the make_scorer functions? So e.g. if one of the metrics is not calculated (due to an exception) just put in NaN and go on with the next model.

Edit: Apparently the error_score excepts model training but not the metric calculation. If I use eg Accuracy everything works and I just get the warnings in the folds where I have only one class label. If I use eg AUC as Metric I still get the exceptions thrown.

Would be great to get some ideas here!

Solution: Define a custom scorer with exception:

def custom_scorer(y_true, y_pred, actual_scorer):
score = np.nan

try:
  score = actual_scorer(y_true, y_pred)
except ValueError: 
  pass

return score

This leads to a new metric:

acc = make_scorer(accuracy_score)
recall = make_scorer(custom_scorer, actual_scorer=recall_score)
new_auc = make_scorer(custom_scorer, actual_scorer=roc_auc_score)
mcc = make_scorer(custom_scorer, actual_scorer=matthews_corrcoef)
bal_acc = make_scorer(custom_scorer,actual_scorer=balanced_accuracy_score)

scoring = {"roc_auc_score": new_auc, "recall": recall, "MCC" : mcc, 'Bal_acc' : bal_acc, "Accuracy": acc }

Which in turn can be passed to the scoring parameter of RandomizedSearchCV

A second solution I found was :

def custom_auc(clf, X, y_true):
score = np.nan
y_pred = clf.predict_proba(X)
try:
    score = roc_auc_score(y_true, y_pred[:, 1])
except Exception:
    pass

return score

which also can be passed to the scoring argument:

scoring = {"roc_auc_score": custom_auc, "recall": recall, "MCC" : mcc, 'Bal_acc' : bal_acc, "Accuracy": acc }

(Adapted from this answer)

JennyH
  • 292
  • 1
  • 2
  • 11
  • Not entirely clear what you want. You are using `error_score=np.nan` which will do what you require. Do you need anything else, or is it not working as expected? – Vivek Kumar Dec 10 '18 at 13:13
  • I added the problem above. Basically it is not working as expected since even with error_score I get the exceptions – JennyH Dec 10 '18 at 13:35
  • Oh yes, my bad. The `error_score` will only cover `estimator.fit()`. Can you give an example of "`the AUC is throwing an exception because there is only one class label for this exact split.`"? – Vivek Kumar Dec 10 '18 at 13:47
  • "ValueError: Only one class present in y_true. ROC AUC score is not defined in that case" would be the exception I got (so far). – JennyH Dec 10 '18 at 14:22

1 Answers1

3

You can have a generic scorer which can take other scorers as input, check the results, catch any exceptions they throw and return a fixed value on them.

def custom_scorer(y_true, y_pred, actual_scorer):
    score = np.nan

    try:
      score = actual_scorer(y_true, y_pred)
    except Exception: 
      pass

    return score

Then you can call this using:

acc = make_scorer(custom_scorer, actual_scorer = accuracy_score)
auc_score = make_scorer(custom_scorer, actual_scorer = roc_auc_score, 
                        needs_threshold=True) # <== Added this to get correct roc
recall = make_scorer(custom_scorer, actual_scorer = recall_score)
mcc = make_scorer(custom_scorer, actual_scorer = matthews_corrcoef)
bal_acc = make_scorer(custom_scorer, actual_scorer = balanced_accuracy_score)

Example to reproduce:

import numpy as np
def custom_scorer(y_true, y_pred, actual_scorer):
    score = np.nan

    try:
      score = actual_scorer(y_true, y_pred)
    except Exception: 
      pass

    return score


from sklearn.metrics import make_scorer, roc_auc_score, accuracy_score
acc = make_scorer(custom_scorer, actual_scorer = accuracy_score)
auc_score = make_scorer(custom_scorer, actual_scorer = roc_auc_score, 
                        needs_threshold=True) # <== Added this to get correct roc

from sklearn.datasets import load_iris
X, y = load_iris().data, load_iris().target

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV, KFold
cvv = KFold(3)
params={'criterion':['gini', 'entropy']}
gc = GridSearchCV(DecisionTreeClassifier(), param_grid=params, cv =cvv, 
                  scoring={"roc_auc": auc_score, "accuracy": acc}, 
                  refit="roc_auc", n_jobs=-1, 
                  return_train_score = True, iid=False)
gc.fit(X, y)
print(gc.cv_results_)
Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132
  • I see your idea here and like the workaround. Unfortunately it fails with: in get return _ForkingPickler.loads(res) AttributeError: Can't get attribute 'custom_scorer' and a bit later in the stacktrace: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable. Running it with n_jobs = 1 works fine, so I guess its a problem of multithreading. – JennyH Dec 10 '18 at 14:41
  • @JennyH I am not getting any error even with `n_jobs=-1`. Have you defined the scorer in another file and trying to import it or in the same file? I have updated the minimal example that works. – Vivek Kumar Dec 10 '18 at 15:06
  • And there we go. I still had scikit-learn 0.20.0. Updating to 0.20.1 helped, now its working like a charm. Sorry and thanks for the MWE! – JennyH Dec 10 '18 at 15:32
  • I let it run overnight, unfortunately I again got an exception thrown: If I use predict_proba=True with AUC (not Threshold because that is not needed for the auc, and gives an error as well) I again end up with: ValueError: got predict_proba of shape (96, 1), but need classifier with two classes for custom_scorer scoring – JennyH Dec 11 '18 at 12:15