2

I am trying to use dropout with CudnnLSTM (tf.contrib.cudnn_rnn.python.layers.CudnnLSTM), and I would like to be able to build just one graph and set the dropout to some non-zero fractional value for training and then set the dropout to 0 for measuring validation error metrics. With the usual Tensorflow LSTM cell (tf.contrib.rnn.LSTMCell) this is not too difficult because the keep_prob parameter accepts a Tensor but I find this is not an option for CudnnLSTM.

To be able to set the dropout, I tried to use a global variable to set the dropout and then change the value of that global variable between training and validation, but I don't think this works (can't prove it but it's my best guess). In particular my training and validation errors are about the same, whereas in the past when I train with dropout in an RNN (on the same data set) the validations tends to quickly be better than the training (since validation has dropout % set to 0). I have used the usual LSTM with such results (on the same data set), so I expected to see something similar with Cudnn.

So I have two questions.

  1. How can I definitely know whether the dropout is changing or not as I change the value of the global variable I used to set it? (my guess is not but if someone tells me I am wrong...how can I verify this?). I notice the git commit history, at least to me, is a little confusing as to whether the dropout is actually working even in the layers implementation.
  2. If setting the dropout via a global doesn't work, and I can't use a Tensor, how can I set the dropout to be different for training and validation? I suppose one way to do this is to build two graphs sharing weights, but how would I do this given that CudnnLSTM makes its own weights rather than having them passed in? Would someone be able to provide a code example as I have not been able to find one?

Thanks for any help.

talonmies
  • 70,661
  • 34
  • 192
  • 269
TFdoe
  • 571
  • 5
  • 16
  • Have you tried `DropoutWrapper` that can be applied to any rnn cell? – Maxim Dec 08 '17 at 14:07
  • @Maxim I will give it a try but my working assumption was that DropoutWrapper doesn't directly work with the CUDA RNN API as does CudnnLSTM – TFdoe Dec 08 '17 at 18:23

1 Answers1

1

The training parameter in the model call method in part controls whether dropout takes effect. If training = true, then dropout is applied; if training = false, dropout is ignored.

### testing out dropout with cudnn_rnn to see how it works
layers       = 5
hidden_units = 3
dropout      = 1.0
model        = cudnn_rnn.CudnnGRU(layers, hidden_units, dropout = dropout)

data = tf.ones([128, 100, 3])
model.build(data.shape)

training_output,  training_state  = model(data, training = True)
inference_output, inference_state = model(data, training = False)

sess = tf.Session()
sess.run(tf.global_variables_initializer())
x, y = sess.run([training_output,  training_state])
w, v = sess.run([inference_output, inference_state])

We can see that x and y are all 0 because dropout is set to 1.0. But w and v are non-zero.

TFdoe
  • 571
  • 5
  • 16