I am trying to use dropout with CudnnLSTM (tf.contrib.cudnn_rnn.python.layers.CudnnLSTM
), and I would like to be able to build just one graph and set the dropout to some non-zero fractional value for training and then set the dropout to 0 for measuring validation error metrics. With the usual Tensorflow LSTM cell (tf.contrib.rnn.LSTMCell
) this is not too difficult because the keep_prob
parameter accepts a Tensor
but I find this is not an option for CudnnLSTM
.
To be able to set the dropout, I tried to use a global variable to set the dropout and then change the value of that global variable between training and validation, but I don't think this works (can't prove it but it's my best guess). In particular my training and validation errors are about the same, whereas in the past when I train with dropout in an RNN (on the same data set) the validations tends to quickly be better than the training (since validation has dropout % set to 0). I have used the usual LSTM with such results (on the same data set), so I expected to see something similar with Cudnn
.
So I have two questions.
- How can I definitely know whether the dropout is changing or not as I change the value of the global variable I used to set it? (my guess is not but if someone tells me I am wrong...how can I verify this?). I notice the git commit history, at least to me, is a little confusing as to whether the dropout is actually working even in the layers implementation.
- If setting the dropout via a global doesn't work, and I can't use a
Tensor
, how can I set the dropout to be different for training and validation? I suppose one way to do this is to build two graphs sharing weights, but how would I do this given thatCudnnLSTM
makes its own weights rather than having them passed in? Would someone be able to provide a code example as I have not been able to find one?
Thanks for any help.