Does the sigmoid function effect the slowdown for weights not connected to the output layer when using cross entropy function?

Question

I've been reading on error functions for neural nets on my own. http://neuralnetworksanddeeplearning.com/chap3.html explains that using cross entropy function avoids slowdown (ie the network learns faster if the predicted output is far from the target output). The author shows that the weights that are connected to the output layer will ignore the sigmoid prime function, which is causing the slowdown.

But what about the weights that are further back? By deriving (I'm getting the same derivation when the quadratic error function was used), I'm finding the sigmoid prime term appears in those weights. Wouldn't that contribute to slowdown? (Maybe I derived it incorrectly?)

score 1 · Accepted Answer · answered Mar 26 '18 at 08:52

Yes, all sigmoid layers will suffer from slowing down learning except last one. I guess your derivation is correct, actually Quadratic Error, Sigmoid + BinaryCrossEntropyLoss and Softmax + SoftmaxCrossEntropyLoss share same form of backpropagation formula y_i - y. See the code here of the three losses: L2Loss, BinaryLoss, SoftmaxLoss

Does the sigmoid function effect the slowdown for weights not connected to the output layer when using cross entropy function?

1 Answers1