-1

I am looking to perform a backward feature selection process on a logistic regression with the AUC as a criterion. For building the logistic regression I used the scikit library, but unfortunately this library does not seem to have any methods for backward feature selection. My dependent variable is a binary banking crisis variable and I have 13 predictors. Does anybody have any suggestions on how to handle this?

The code below states the method to compute the AUC. The problem is that I do not know how to decide which feature I can prune because it is less important than the other.

def cv_loop(X, y, model, N):
    mean_auc = 0.
    for i in range(N):
        X_train, X_cv, y_train, y_cv = train_test_split(
                                       X, y, test_size=.20, 
                                       random_state = i*SEED)
        model.fit(X_train, y_train)
        preds = model.predict_proba(X_cv)[:,1]
        fpr, tpr, _ = metrics.roc_curve(y_cv, preds)
        auc = metrics.auc(fpr, tpr)
        print("AUC (fold %d/%d): %f" % (i + 1, N, auc))
        mean_auc += auc
    return mean_auc/N

If you need more background information let me know!

Many thanks in advance,

Joris

1 Answers1

0

scikit-learn has Recursive Feature Elimination (RFE) in its feature_selection module, which almost does what you described.

Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and the importance of each feature is obtained either through a coef_ attribute or through a feature_importances_ attribute. Then, the least important features are pruned from current set of features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached.

This doesn't explicitly work on AUC however. It does the prunining by looking at coefficients of the logistic regression.

Cihan
  • 2,267
  • 8
  • 19
  • Thanks for the quick reply! I found this module as well, but unfortunately that does not solve my problem, as I need to use AUC as the maximizing criterion. – Joris Rump Jul 05 '20 at 15:00