I'm working on the Language Model tutorial of Tensorflow. My question is:
In this line, they use a Wrapper to apply dropout to the RNNs
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(size, forget_bias=0.0)
if is_training and config.keep_prob < 1:
lstm_cell = tf.nn.rnn_cell.DropoutWrapper(
lstm_cell, output_keep_prob=config.keep_prob)
why do they have to apply dropout again to inputs in this line?
if is_training and config.keep_prob < 1:
inputs = tf.nn.dropout(inputs, config.keep_prob)
Thanks!
Edit: OK I didn't fully understand the paper at the time I wrote this question. Basically Zambera suggested to apply dropout everywhere except from hidden to hidden. However, a layer's output is the next layer's input, so we apply dropout to every layer's output, and then to the input of first layer.