In "TRAINING RECURRENT NEURAL NETWORK" by Ilya Sutskever, there's the following technique for calculating derivatives with backpropagation in feed-forward neural networks.
The network has l hidden layers, l+1 weight matrices and b+1 bias vectors.
"Forward" stage:
"Backwards" stage:
Isn't there an index problem with l+1? for example, in the forward stage we calculate z_l+1 but return z_l.
(Since this is such a major paper, I guess I'm missing something)