24

I have a laptop with a GeForce 940 MX. I want to get Tensorflow up and running on the gpu. I installed everything from their tutorial page, now when I import Tensorflow, I get

>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened  CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:119] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH: 
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: workLaptop
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1092] LD_LIBRARY_PATH: 
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1093] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libnvidia-fatbinaryloader.so.367.57: cannot open shared object file: No such file or directory
 I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
>>> 

after which I think it just switches to running on the cpu.

EDIT: After I nuked everything , started from scratch. Now I get this:

>>> import tensorflow
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:119] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: workLaptop
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1092] LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1093] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libnvidia-fatbinaryloader.so.367.57: cannot open shared object file: No such file or directory
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Qubix
  • 4,161
  • 7
  • 36
  • 73
  • Have you actually installed the NVIDIA driver? libcuda is part of the driver, not the CUDA toolkit – talonmies Jan 27 '17 at 09:31
  • 3
    Locate the file using `find /usr/ -name 'libcuda.so.1'` Is it in the standard cuda library directories i assume you added to `LD_LIBRARY_PATH`? If not, just create a symbolic link to it in cuda lib directory. – pSoLT Jan 27 '17 at 09:35
  • /usr/lib/x86_64-linux-gnu/libcuda.so.1 and /usr/lib/i386-linux-gnu/libcuda.so.1. Where exactly is the cuda lib directory? – Qubix Jan 27 '17 at 09:37
  • I will repeat ``could not open driver version path for reading: /proc/driver/nvidia/version" means that you don't have a functional CUDA driver at the time you are running Tensorflow – talonmies Jan 27 '17 at 10:19
  • reinstalled everything, updated the error – Qubix Jan 27 '17 at 11:55
  • The error is still the same error and the problem is still the same problem. – talonmies Jan 27 '17 at 13:29
  • So the solution is not the proposed one. – Qubix Jan 27 '17 at 14:03
  • 1
    This does not look like a Tensorflow problem; instead, it looks like you don't have a correctly installed and running NVidia driver. One test: try running "nvidia-smi". It should print a list of the available GPUs if the driver is installed correctly. – Peter Hawkins Jan 31 '17 at 14:35
  • Hi, did you managed to get it running on this GPU? I want to buy a laptop with it and wonder if it will work with tensorflow – Moshe Kravchik Jul 11 '17 at 13:27
  • @MosheKravchik , yes, I just had to reinstall CUDA and CUDNN. I does work but it's slow. – Qubix Jul 11 '17 at 13:53

3 Answers3

18

libcuda.so.1 is a symlink to a file that is specific to the version of your NVIDIA drivers. It may be pointing to the wrong version or it may not exist.

# See where the link is pointing.  
ls  /usr/lib/x86_64-linux-gnu/libcuda.so.1 -la
# My result:
# lrwxrwxrwx 1 root root 19 Feb 22 20:40 \
# /usr/lib/x86_64-linux-gnu/libcuda.so.1 -> ./libcuda.so.375.39

# Make sure it is pointing to the right version. 
# Compare it with the installed NVIDIA driver.
nvidia-smi

# Replace libcuda.so.1 with a link to the correct version
cd /usr/lib/x86_64-linux-gnu
sudo ln -f -s libcuda.so.<yournvidia.version> libcuda.so.1

Now in the same way, make another symlink from libcuda.so.1 to a link of the same name in your LD_LIBRARY_PATH directory.

You may also find that you need to create a link to libcuda.so.1 in /usr/lib/x86_64-linux-gnu named libcuda.so

Alex Payne
  • 724
  • 7
  • 8
  • 7
    ***Now in the same way, make another symlink from libcuda.so.1 to a link of the same name in your LD_LIBRARY_PATH directory.*** How exactly is this done? What is my "LD_LIBRARY_PATH directory"? Thanks a lot! – Alex Sep 08 '17 at 13:53
  • I faced the same issue. my Nvidia version is 450.216.04. How to write it? sudo ln -f -s libcuda.so.450.216.04 libcuda.so.1, reight? – Redhwan Feb 27 '23 at 05:41
17

In case anyone still encounters this. First make sure to add the --runtime=nvidia parameter in order to run your container.

docker run --runtime=nvidia -t tensorflow/serving:latest-gpu

where tensorflow/serving:latest-gpu is the name of the docker image.

Matthieu Brucher
  • 21,634
  • 7
  • 38
  • 62
Rodrigo Loza
  • 1,200
  • 7
  • 14
8

In the case I just solved, it was updating the GPU driver to the latest and installing the cuda toolkit. First, the ppa was added and GPU driver installed:

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install nvidia-390

After adding the ppa, it showed options for driver versions, and 390 was the latest 'stable' version that was shown.

Then install the cuda toolkit:

sudo apt install nvidia-cuda-toolkit

Then reboot:

sudo reboot

It updated the drivers to a newer version than the 390 originally installed in the first step (it was 410; this was a p2.xlarge instance on AWS).

wordsforthewise
  • 13,746
  • 5
  • 87
  • 117