I am somewhat disappointed by the results I am getting. I create two models (sklearn.linear_models.LogisticRegression
) with C=1e80
and penalty = 'l1
' or 'l2'
, and then test them using sklearn.cross_validation.cross_val_score
with cv=3
and scoring='roc_auc'
. To me, C=1e80
should result in virtually no regularization, and the AUC should be the same. Instead, the model with 'l2'
penalty gives worse AUC, and multiple runs give me the same results. How does this happen?
Asked
Active
Viewed 800 times
0

David Maust
- 8,080
- 3
- 32
- 36

Mikhail Akimov
- 1
- 1
-
is your data normalized? Scale of C is strongly correlated with scale of features – lejlot Feb 07 '16 at 00:43
-
Could you post a code and possibly a data sample? – David Maust Feb 07 '16 at 01:08
-
Thank you, @lejlot. Normalization really resolved this issue. I didn't think it matters that much in the cross-validation case... – Mikhail Akimov Feb 08 '16 at 12:35
1 Answers
0
Just to make it a bit more clear. The general form of most loss functions is
C SUM_i=1^N loss(h(x_i), y_i|theta) + regularizer(theta)
thus the whole problem with C
is to find a balance between sum of losses over training samples and regularizer value.
Now, if loss is bounded (like in the case of logistic regression), then without proper normalization L2 regularizer (||theta||^2) may grow to infinity, thus you will need very high C to make it irrelevant and thus equal in solution to L1 (max_j |theta_j|). Similarly if you have loss which grows very fast, such as Lp loss for p>=2, then regularizer might be very small thus you will need very small C
to make it do anything.

lejlot
- 64,777
- 8
- 131
- 164