0

Tensorflow-gpu v1.13.1, CUDA: 10.0, CuDNN: 7.5.1, graphics card: RTX 2080, Ubuntu: 18.04

I am currently trying to train a LSTM model in tf using the CuDNNLSTM, however whenever I run my code I get the following error

2019-04-28 23:43:48.936154: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-04-28 23:43:48.936212: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cudnn_rnn_ops.cc:1217 : Unknown: Fail to find the dnn implementation.
Traceback (most recent call last):
  File "/home/nicholas/PycharmProjects/deepLearninginKeras/crypto_currency_predict/crypto.py", line 139, in <module>
    callbacks=[tensorboard, checkpoint])
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 880, in fit
    validation_steps=validation_steps)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 329, in model_iteration
    batch_outs = f(ins_batch)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3076, in __call__
    run_metadata=self.run_metadata)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnknownError: Fail to find the dnn implementation.
     [[{{node cu_dnnlstm/CudnnRNN}}]]
     [[{{node ConstantFoldingCtrl/loss/dense_1_loss/broadcast_weights/assert_broadcastable/AssertGuard/Switch_0}}]]

I am not sure what exactly is causing the problem, I feel that part of it might be here, the CUDA version I installed/am using is different than that on my graphics card. When in the terminal I use the command "nvidia-smi" I get the following:

NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1

in my ~/.bashrc at the bottom of the page I have the following paths:

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/$/cuda/extras/CUPTI/lib64
export CUDA_HOME=/usr/local/cuda

Any insight would be appreciated. Here is an example layer from my model:

model.add(tf.keras.layers.CuDNNLSTM(128, input_shape=train_x.shape[1:], return_sequences=True))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.BatchNormalization())

WOuld it be best to go to ubuntu 16 for instance or would that not rectify the problem. This seems to be a very common problem with the RTX 20xx's.

Nicholas
  • 13
  • 6

1 Answers1

0

For me, adding the following configuration before fitting solved the problem:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.Session(config=config)
Yida Lin
  • 167
  • 2
  • 10