0

In "TRAINING RECURRENT NEURAL NETWORK" by Ilya Sutskever, there's the following technique for calculating derivatives with backpropagation in feed-forward neural networks.

The network has l hidden layers, l+1 weight matrices and b+1 bias vectors.

"Forward" stage:

enter image description here

"Backwards" stage:

enter image description here

Isn't there an index problem with l+1? for example, in the forward stage we calculate z_l+1 but return z_l.

(Since this is such a major paper, I guess I'm missing something)

Paz
  • 737
  • 7
  • 22

1 Answers1

0

There is no problem, some of the indices start at 0 (the variable z for instance), and some start at 1 (the variable x). Follow the algorithm as laid out more carefully, try writing it out by hand explicitly for say l=4.

Raff.Edward
  • 6,404
  • 24
  • 34
  • I still don't understand why we output z_l when the last calculated entry was z_l+1. Also, it seems z's length is (l+2), since we access z_0 and z_l+1. Is that correct? – Paz Jul 03 '15 at 20:47
  • 2
    It looks like the output is a typo, yes. z has a length of (l+2) yes, but its more for notational convince - treating the input x as an "activation" (z_0) rather than a special case. – Raff.Edward Jul 03 '15 at 21:53
  • OK, so it is a typo after all. Is this also the case in the other direction? (we initialize dz_l but inside the loop use dz_l+1) – Paz Jul 04 '15 at 05:43
  • it looks like that one is also a typo. All three z_l should be z_(l+1), and then the backwards z_l gets computed on line 4. – Raff.Edward Jul 06 '15 at 13:37