Why there is a difference between the accuracy of sklearn.LogisticRegression with penalty='l1' and 'l2' and C=1e80?

Question

I am somewhat disappointed by the results I am getting. I create two models (sklearn.linear_models.LogisticRegression) with C=1e80 and penalty = 'l1' or 'l2', and then test them using sklearn.cross_validation.cross_val_score with cv=3 and scoring='roc_auc'. To me, C=1e80 should result in virtually no regularization, and the AUC should be the same. Instead, the model with 'l2' penalty gives worse AUC, and multiple runs give me the same results. How does this happen?

is your data normalized? Scale of C is strongly correlated with scale of features — lejlot, Feb 07 '16 at 00:43
Thank you, @lejlot. Normalization really resolved this issue. I didn't think it matters that much in the cross-validation case... — Mikhail Akimov, Feb 08 '16 at 12:35

score 0 · Answer 1 · answered Feb 08 '16 at 22:20

Just to make it a bit more clear. The general form of most loss functions is

C SUM_i=1^N loss(h(x_i), y_i|theta) + regularizer(theta)

thus the whole problem with C is to find a balance between sum of losses over training samples and regularizer value.

Now, if loss is bounded (like in the case of logistic regression), then without proper normalization L2 regularizer (||theta||^2) may grow to infinity, thus you will need very high C to make it irrelevant and thus equal in solution to L1 (max_j |theta_j|). Similarly if you have loss which grows very fast, such as Lp loss for p>=2, then regularizer might be very small thus you will need very small C to make it do anything.

Why there is a difference between the accuracy of sklearn.LogisticRegression with penalty='l1' and 'l2' and C=1e80?

1 Answers1