I'm a totally new to machine learning, and I understand the concept of backpropagation and recurrent neural networks, but I can't seem to grasp the backpropagation through time. In Wikipedia pseudocode,
Back_Propagation_Through_Time(a, y) // a[t] is the input at time t. y[t] is the output
Unfold the network to contain k instances of f
do until stopping criteria is met:
x = the zero-magnitude vector;// x is the current context
for t from 0 to n - 1 // t is time. n is the length of the training sequence
Set the network inputs to x, a[t], a[t+1], ..., a[t+k-1]
p = forward-propagate the inputs over the whole unfolded network
e = y[t+k] - p; // error = target - prediction
Back-propagate the error, e, back across the whole unfolded network
Update all the weights in the network
Average the weights in each instance of f together, so that each f is identical
x = f(x); // compute the context for the next time-step
So as I understand, we have the desired output at the current step, we forward pass the steps before, calculate the error between the previous step outputs and the current output.
How are we updating the weights?
Average the weights in each instance of f together, so that each f is identical
What's the meaning of this?
Can anyone describe what BPTT is in simple terms of give a simple reference for a beginner?