In multi-class logistic regression, does SGD one training example update all the weights?

Question

In multi-class logistic regression, lets say we use softmax and cross entropy. Does SGD one training example update all the weights or only a portion of the weights which are associated to the label ? For example, the label is one-hot [0,0,1] Does the whole matrix W_{feature_dim \times num_class} updated or only W^{3}_{feature_dim \times 1} updated ?

Thanks

score 0 · Answer 1 · answered Jun 18 '18 at 08:18

All of your weights are updated.

You have y = Softmax(W x + β), so to predict a y out of a single x you are making use of all your W weights. If something is used during the forward pass (prediction), then it also gets updated during the backward pass (SGD). Perhaps a more intuitive way of thinking about it is that you are essentially predicting the class membership probability for your features; assigning weight to some class means removing weight from another, so you need to update both.

Take for instance the simple case of x ∈ ℝ, y ∈ ℝ³. Then W ∈ ℝ^1×3. Before activation, your prediction for some given x would look like: y= [y₁ = W₁₁x + β₁, y₂ = W₁₂x + β₂, y₃ = W₁₃x + β₃]. You have an error signal for all of these mini-predictions, coming out of categorical crossentropy, for which you must then compute the derivative wrt the W, β terms.

I hope this is clear

In multi-class logistic regression, does SGD one training example update all the weights?

1 Answers1