Thanks for reading this post !
Quick question for RNN enthusiasts here :
I know that in backproprgation through time (BPPT), there is at least 3 steps :
For each element in a sequence :
Step 1 - Compute 'error ratio' of each neuron, from upper layer to lower layer.
Step 2 - Compute a 'weight delta' for each weight (X) using the error ratio mentionned in step 1, and push it into an array
After sequence is finished :
Step 3 - Sum all weight deltas of weight (X) and add it to current value of weight (X)
I am now trying to implement a clockwork RNN (CW RNN), from the documentation found here : http://jmlr.org/proceedings/papers/v32/koutnik14.pdf
From what I undertsand, each 'module' in the hidden layer has the same number of neurons, just a different clock.
The forward pass of a CW RNN seems pretty easy and intuitive.
As for the backward pass, however, that's a different story.
Quoting the documentation :
The backward pass of the error propagation is similar to
SRN as well. The only difference is that the error propagates
only from modules that were executed at time step t. The
error of non-activated modules gets copied back in time
(similarly to copying the activations of nodes not activated
at the time step t during the corresponding forward pass),
where it is added to the back-propagated error.
This is where i get confused.
Which of the above backpropagation step(s) are applied on a non-activated module in the hidden layer ?
(A module for which it's clock MOD timestep != 0)
step1 , step2 , or BOTH ?
Thanks again for your help !