I've been reading on error functions for neural nets on my own. http://neuralnetworksanddeeplearning.com/chap3.html explains that using cross entropy function avoids slowdown (ie the network learns faster if the predicted output is far from the target output). The author shows that the weights that are connected to the output layer will ignore the sigmoid prime function, which is causing the slowdown.
But what about the weights that are further back? By deriving (I'm getting the same derivation when the quadratic error function was used), I'm finding the sigmoid prime term appears in those weights. Wouldn't that contribute to slowdown? (Maybe I derived it incorrectly?)