Why does support vectors in SVM have alpha (Lagrangian multiplier) greater than zero?

Question

I understood the overall SVM algorithm consisting of Lagrangian Duality and all, but I am not able to understand why particularly the Lagrangian multiplier is greater than zero for support vectors.

Thank you.

You can find this answer helpful in stats stack https://stats.stackexchange.com/questions/54976/why-are-the-lagrange-multipliers-sparse-for-svms — iRestMyCaseYourHonor, Jul 03 '20 at 20:17

score 10 · Answer 1 · answered Mar 20 '18 at 16:30

This might be a late answer but I am putting my understanding here for other visitors.

Lagrangian multiplier, usually denoted by α is a vector of the weights of all the training points as support vectors.

Suppose there are m training examples. Then α is a vector of size m. Now focus on any ith element of α: α_i. It is clear that α_i captures the weight of the ith training example as a support vector. Higher value of α_i means that ith training example holds more importance as a support vector; something like if a prediction is to be made, then that ith training example will be more important in deriving the decision.

Now coming to the OP's concern:

I am not able to understand why particularly the Lagrangian multiplier is greater than zero for support vectors.

It is just a construct. When you say α_i=0, it is just that ith training example has zero weight as a support vector. You can instead also say that that ith example is not a support vector.

Side note: One of the KKT's conditions is the complementary slackness: α_ig_i(w)=0 for all i. For a support vector, it must lie on the margin which implies that g_i(w)=0. Now α_i can or cannot be zero; anyway it is satisfying the complementary slackness condition. For α_i=0, you can choose whether you want to call such points a support vector or not based on the discussion given above. But for a non-support vector, α_i must be zero for satisfying the complementary slackness as g_i(w) is not zero.

score 0 · Answer 2 · answered Jun 17 '16 at 21:24

I can't figure this out too...

If we take a simple example, say of 3 data points, 2 of positive class (yi=1): (1,2) (3,1) and one negative (yi=-1): (-1,-1) - and we calculate using Lagrange multipliers, we will get a perfect w (0.25,0.5) and b = -0.25, but one of our alphas was negative (a1 = 6/32, a2 = -1/32, a3 = 5/32).

Why does support vectors in SVM have alpha (Lagrangian multiplier) greater than zero?

2 Answers2