I would like to speed up my LSTM network, but as I am using it for a OCR (where sequences have variable lenght), I can not use plain LSTM implementation. That is why I use "tf.nn.dynamic_rnn".
Based on benchmark of RNN in tensorflow (https://github.com/tensorflow/tensorflow/blob/754048a0453a04a761e112ae5d99c149eb9910dd/tensorflow/contrib/cudnn_rnn/python/kernel_tests/cudnn_rnn_ops_benchmark.py#L77), the CUDNN implementation is used for creating all model at once (it does not use "tf.nn.rnn" structure like others). I assume that it maybe impossible to use CUDNN with variable length, but maybe anybody success it?
Second this is using "tf.nn.bidirectional_dynamic_rnn", as I would like to use Bi-LSTM for OCR. But this should be resolved after implementing the first part.
Edit: It looks like "tf.contrib.cudnn_rnn.CudnnLSTM" have "bidirectional" implementation inside. So the only unknown this is that CUDNN can be used with variable input sequence.
Or maybe any working example which use 'CudnnLSTM' would be helpfull.