I am looking to perform a backward feature selection process on a logistic regression with the AUC as a criterion. For building the logistic regression I used the scikit library, but unfortunately this library does not seem to have any methods for backward feature selection. My dependent variable is a binary banking crisis variable and I have 13 predictors. Does anybody have any suggestions on how to handle this?
The code below states the method to compute the AUC. The problem is that I do not know how to decide which feature I can prune because it is less important than the other.
def cv_loop(X, y, model, N):
mean_auc = 0.
for i in range(N):
X_train, X_cv, y_train, y_cv = train_test_split(
X, y, test_size=.20,
random_state = i*SEED)
model.fit(X_train, y_train)
preds = model.predict_proba(X_cv)[:,1]
fpr, tpr, _ = metrics.roc_curve(y_cv, preds)
auc = metrics.auc(fpr, tpr)
print("AUC (fold %d/%d): %f" % (i + 1, N, auc))
mean_auc += auc
return mean_auc/N
If you need more background information let me know!
Many thanks in advance,
Joris