Why is 1-norm SVM more sparse than 2-norm SVM?

Question

How are we increasing sparsity by using 1-norm weight in cost function as compared to using 2-norm weight in the same cost function for an SVM.

For 1-norm : Cost function- Minimize ||w||_1
For 2-norm : Cost function - Minimize ||w||_2

Is it related to LP-SVM?

Rob Neuhaus · Answer 1 · 2012-06-17T15:23:37.820

Look at the partial derivative of the l_1 loss with respect to some parameter.

The loss is constant with respect to an increase in weight. So that increased weight needs to offset some fixed amount of error, regardless of how small the weight already is.

Compare this the l2 loss, where the penalty scales with the size of the current parameter. So as it gets near 0, it only needs to have an infinitesimal decrease in error to offset the regularization penalty.

score 0 · Answer 2 · answered Jun 16 '12 at 09:38

Note that ||w||_2 < ||w||_1 for the same w when 0 < w < 1 (which usually happens) since L2 norm squares the weights.

That's why ||w||_1 is a harder constraint which results in a sparse vector.

It's not specific to SVM, many algorithms use L1 or L2 regularizations.

Why is 1-norm SVM more sparse than 2-norm SVM?

2 Answers2