Tensorflow Language Model tutorial dropout twice?

Question

I'm working on the Language Model tutorial of Tensorflow. My question is:

In this line, they use a Wrapper to apply dropout to the RNNs

lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(size, forget_bias=0.0)
if is_training and config.keep_prob < 1:
  lstm_cell = tf.nn.rnn_cell.DropoutWrapper(
      lstm_cell, output_keep_prob=config.keep_prob)

why do they have to apply dropout again to inputs in this line?

if is_training and config.keep_prob < 1:
  inputs = tf.nn.dropout(inputs, config.keep_prob)

Thanks!

Edit: OK I didn't fully understand the paper at the time I wrote this question. Basically Zambera suggested to apply dropout everywhere except from hidden to hidden. However, a layer's output is the next layer's input, so we apply dropout to every layer's output, and then to the input of first layer.

Could you make the question self-contained? The pointers are valid, but not convenient to see at a glance what you are talking about. This said, the two dropouts are different kinds. The [RNN dropout wrapper is specific to RNN](http://arxiv.org/abs/1409.2329), whereas the input dropout is the "regular one", dropping one of the input entries. — Eric Platon, Jun 10 '16 at 23:18
@EricPlaton Thanks for your answer. I've edited the question. I read that paper too. From what I understand, the paper details how dropout is applied to only non-recurrent connections of RNNs only. I don't see where they say they apply dropout to input entries. Is it a popular practice please? If it's possible, can you point me to some papers on good dropout practice please? Thanks! — tnq177, Jun 10 '16 at 23:40
That paper explains the background behind the "dropout wrapper" for RNNs. Earlier papers circa 2010 mention now "usual dropouts" for feed-forward networks, but they differ in detail (same goal, though). Now the dropout on inputs is a "usual" one. My understanding---and bear with me, this is my understanding for now---is that the dropout on inputs completes the dropout on the LSTM part. After all, the input layer here is a perceptron (so feed-forward sub-network). The input dropout is useful as the network accepts sequences as inputs (e.g. a phrase, a video, or a sound track). — Eric Platon, Jun 11 '16 at 06:52
Good question, I wonder why the person who wrote the code didn't choose to use `output_keep_prob` in `DropoutWrapper` if the input dropout was to applied as well... — Blue482, Sep 25 '16 at 23:06

Tensorflow Language Model tutorial dropout twice?

0 Answers0