1

I'm training a simple logistic regression classifier using LIBLINEAR. There are only 3 features, and label is binary 0-1.

Sample input file:

1   1:355.55660999775586    2:-3.401379785      3:5
1   1:252.43759050148728    2:-3.96044759307    3:9
1   1:294.15085871437088    2:-13.1649273486    3:14
1   1:432.10492221032933    2:-2.72636786196    3:9
0   1:753.80863694081768    2:-12.4841741178    3:14
1   1:376.54927850355756    2:-6.9494008935     3:7

Now, if I use "-s 6", which is "L1-regularized logistic regression", then the 10-fold cross validation accuracy is around 70%, and each iter finishes within seconds. But if I use "-s 7", which is "L2-regularized logistic regression (dual)", then the training iteration exceeds 1000, and the 10-fold accuracy is only 60%.

Has anybody seen this kind of strange behavior? From my understanding, the only difference between L1 and L2 is whether the regularization term uses abs(x) or pow(x, 2).

menphix
  • 309
  • 4
  • 12

3 Answers3

0

Thanks for posting this! I work with liblinear fairly often and generally always use L2 loss without thinking. This article does a pretty good job explaining the difference: http://www.chioka.in/differences-between-l1-and-l2-as-loss-function-and-regularization/

Based on that, I'm guessing that not only do you have a small amount of features but maybe also a small dataset? Have you tried to increase the number of input points?

ABC
  • 665
  • 1
  • 6
  • 15
  • Thanks for reply! I have around 300000 training examples, among which 64% are positive training examples. Do you think this is enough? – menphix May 25 '15 at 00:21
  • BTW, very helpful article! – menphix May 25 '15 at 00:21
  • Happy the article helped! Oh yea that should way more than enough, and the skew shouldn't be big enough to make that much of a difference. Have you tried standard things like perhaps regularizing the input data? Can you plot the error as a function of training iterations? Maybe it doesn't converge? – ABC May 25 '15 at 02:25
0

Not think it's a 'Strange' behavior in my poor opinion. You have to make a trial to confirm which one is fitted into your case better before you have not any sense of it. Theoretically,L1-regular is bounded,just like feature selection,while l2-regular is more smooth.

joe
  • 177
  • 1
  • 1
  • 11
0

I just realized there are two logistic regression classifier provided by LIBLINEAR:

0 -- L2-regularized logistic regression (primal)
7 -- L2-regularized logistic regression (dual)

I was using 7, which doesn't converge even after 1000 iterations. After I switched to 0, it converged very fast and was able to get to ~70% accuracy.

I believe the dual vs. primal is mainly the difference in optimization methods, so I think this is probably some numerical computation issue.

For more info on dual form vs. primal form: https://stats.stackexchange.com/questions/29059/logistic-regression-how-to-get-dual-function

Community
  • 1
  • 1
menphix
  • 309
  • 4
  • 12