I have seen the weights of neural networks initialized to random numbers so I am curious why the weights of logistic regression get initialized to zeros?
5 Answers
Incase of Neural Networks there n neurons in each layer. So if you initialize weight of each neuron with 0 then after back propogation each of them will have same weights :
Neurons a1 and a2 in the first layer will have same weights no matter how long you iterate. Since they are calculating the same function.
Which is not the case with logistic regression its simply y = Wx + b.

- 1,513
- 3
- 20
- 37
-
2Does that mean for NN because there's no bias added hence it will always remains the same? But it's not the case for logistic regression? – melaos Apr 20 '18 at 00:05
I think the above answers are a bit misleading. Actually, the sigmoid function, which is also called the logit function, is always used in logistic regression for its special properties. For example,
(Sorry for the ugly formula). And its corresponding function is shown as below:
Thus, zeros ensure the values are always on the linear area, making the propagation easier.

- 1,552
- 1
- 9
- 15
If all the weights are initialized to zero, backpropagation will not work as expected because the gradient for the intermediate neurons and starting neurons will die out(become zero) and will not update ever. The reason is, in backward pass of the NN, the gradient at some intermediate neuron is multiplied by the weights of the outgoing edge from that neuron to the neuron in next layer, which would be zero and hence the gradient at that intermediate node would be zero too. Subsequently all the weights will never improve and the model will end up only correcting the weights directly connected to output neurons only.

- 71
- 4
Does this mean that Neural Net with Weights initialized to zero is just as good as a plain logistic regression or say a NN with one single unit which computes WX+b?

- 891
- 3
- 9
- 33
-
2It's not correct since you're using some kind of activation function (sigmoid, tanh, etc) that gives you nonlinearity. I think if you initialize *all* the weights equal to zero, then you'll end up one neuron for each layer since all the weights will be equal to the same thing for all neurons in each layer. – Baskaya Jun 13 '18 at 04:20
In logistic regression, the linear equation a = Wx + b where a is a scalar and W and x are both vectors. The derivative of the binary cross entropy loss with respect to a single dimension in the weight vector W[i] is a function of x[i], which is in general different than x[j] when i not equal j.