3

I'm using Anaconda (in Ubuntu 18.04) and I have an environment with Keras (and tensorflow-gpu) installed. Here are the different versions:

  • Keras: 2.2.4
  • Tensorflow-GPU: 1.15.0
  • CuDNN: 7.6.5 for Cuda10.0.0
  • CudaToolKit: 10.0.130

The version are chosen by Conda, but I'm wondering why it downloaded 10.0 when nvidia-smi shows me that my cuda should be (or is?) 10.1:

NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1

But, fun fact, when I do nvcc --version:

Cuda compilation tools, release 9.1, V9.1.85

So here comes my question(s): what version of Cuda am I using? What version of Cuda should I be using? Does Anaconda handle Cuda by environment?

PS: (this is not my question, but why I ask it)

I'm asking that because I'm running into this issue:

tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

I looked for an solution (could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR) but none of the answer I tried worked (deleting files, running in sudo, etc) so I think it's a compatibility issue

talonmies
  • 70,661
  • 34
  • 192
  • 269
FoxYou
  • 120
  • 1
  • 11
  • did you try: 'export PATH=/usr/local/cuda-10.1/bin:/usr/local/cuda-10.1/NsightCompute-2019.1${PATH:+:${PATH}}' and 'export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib\ ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' – Berkay Jan 10 '20 at 18:55
  • @Berkay I actually don't have the folder /usr/local/cuda-XX – FoxYou Jan 10 '20 at 18:58
  • You should give a try that: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions – Berkay Jan 10 '20 at 19:01
  • I think problem is on CUDA installation. I have that folder you mentioned. @FoxYou – Berkay Jan 10 '20 at 19:07
  • @Berkay I'll try to install CUDA (and my drivers) again, following the post-installation you sent me and I'll come back to you later, thanks! – FoxYou Jan 10 '20 at 19:11
  • You are welcome! Maybe you should install drivers from software & updates/additional drivers section "https://itsfoss.com/install-additional-drivers-ubuntu/". Let me know the updates. – Berkay Jan 10 '20 at 19:25
  • @Berkay okay, we made some progress here! nvcc -V shows the same version as nvidia-smi (Cuda 10.2). Unfortunately, my issue is still showing but now at least it makes some sense. I think that Cuda 9.1 is still installed somewhere though. Weirdly, Conda continues to install version 10, any clue why? – FoxYou Jan 10 '20 at 20:22
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/205774/discussion-between-foxyou-and-berkay). – FoxYou Jan 10 '20 at 20:27
  • @Berkay, why do you think he ought to check PATH and LD if tensorflow searches for these libs inside the conda environment, i mean modules `cudatoolkit` and `cudnn` ? – ivan866 Aug 13 '20 at 16:10

1 Answers1

2

Note: altough I do not consider this answer as THE solution, it allowed me to continue working on my project so it's good enough for the moment.

  1. Reinstall Cuda 10.1 (not 10.2 in my case because of some issue with the driver 440 with Steam) (check what version your nvidia driver is and be sure to install the correct Cuda for that version)
  2. Follow the post-installation: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions
  3. Use whereis cuda to find if there are other version left on the system (in my case, I had cuda-dev-9.1, which explains why nvcc -V showed that version)
  4. Delete all old versions
  5. Normally, nvcc -V and nvidia-smi should show the same Cuda version
  6. Reinstall cudnn if needed

Now, this doesn't fix the bug:

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

A working solution (but still not awesome) is to add the following code on top of your python file (I use Keras, but it works with TensorFlow alone as well):

from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
set_session(sess)

And it's (apparently) working!

Big thanks to Berkay of his support!

(technically, try to delete the old versions before adding another one, but it works too)

FoxYou
  • 120
  • 1
  • 11
  • 'reinstall cudnn' - you mean the python module inside the environment or the standalone package downloaded from nvidia website after registering the developer program ? – ivan866 Aug 13 '20 at 16:12
  • @ivan866 I think I had to reinstall the cudnn from Nvidia website – FoxYou Aug 16 '20 at 12:27