These days I study something about RNN and teacher forcing. But there is one point that I can't figure out. What is the principle of readout and teacher forcing? How can we feeding the output(or ground truth) of RNN from the previous time step back to the current time step, by using the output as features together with the input of this step, or using the output as this step's cell state? I have read some paper but still it confused me.o(╯□╰)o. Hoping someone can answer for me。
1 Answers
Teacher forcing is the act of using the the ground truth as the input for each time step, rather than the output of the network, the following is some pseudo code to describe the situation.
x = inputs --> [0:n]
y = expected_outputs --> [1:n+1]
out = network_outputs --> [1:n+1]
teacher_forcing():
for step in sequence:
out[step+1] = cell(x[step])
As you can see rather than feeding the output of the network at the previous time step as input it instead provides the ground truth.
It was originally motivated to avoid BPTT (back propagation through time) for models that dont contain hidden to hidden connections, eg GRU's (gated recurrent units).
It can also be used in a training regime, the idea being that from the beginning of the train to the end you slowly decrease the amount of teacher forcing. This has been shown to have regularizing effects on the network. The paper linked here has further reading or the Deep Learning Book is good too.
-
can you say teaching forcing is simply a feedforward neural network? – ArtificiallyIntelligence Nov 10 '17 at 06:21
-
1No since information still flows through the recurrent connections, a feed forward network has no recurrent connections – Oliver Nov 11 '17 at 07:15
-
but assume x_{i+1} = F(x_i), F is the recurrent neural network model I have, what if you unroll a standard recurrent neural network into mulitple feedforward neural network that share weights, and at the same time, using teacher forcing to train this RNN, wouldn't this whole process be exactly training a Feedforward neural network? In any case, do you agree that teacher forcing is simply parallize the training process in a temporal sense? – ArtificiallyIntelligence Nov 12 '17 at 04:31
-
1A vanilla rnn is parameterized as f(x_i, h_(x-1)). The function you have above cannot be a recurrent net since it has no recurrent inputs i.e there is no h_(x-1) – Oliver Nov 13 '17 at 07:49