16

I am trying to run caffe on Linux Ubuntu. After installation, I run caffe in gpu and the error is

I0910 13:28:13.606891 10629 caffe.cpp:296] Use GPU with device ID 0
modprobe: ERROR: could not insert 'nvidia_352': No such device
F0910 13:28:13.728612 10629 common.cpp:142] Check failed: error == cudaSuccess (38 vs. 0)  no CUDA-capable device is detected
*** Check failure stack trace: ***
    @     0x7ffd3b9a7daa  (unknown)
    @     0x7ffd3b9a7ce4  (unknown)
    @     0x7ffd3b9a76e6  (unknown)
    @     0x7ffd3b9aa687  (unknown)
    @     0x7ffd3bf91cb5  caffe::Caffe::SetDevice()
    @           0x40a5a7  time()
    @           0x4080f8  main
    @     0x7ffd3aeb9ec5  (unknown)
    @           0x408618  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)

My NVIDIA driver is 352.41. I installed 352 and it is installed latest version.

sudo apt-get install nvidia-352[sudo] 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
nvidia-352 is already the newest version.
The following packages were automatically installed and are no longer required:
  account-plugin-windows-live libupstart1
Use 'apt-get autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 31 not upgraded.

My Ubuntu has NVIDIA driver 352 and why I have error like

I0910 13:28:13.606891 10629 caffe.cpp:296] Use GPU with device ID 0
    modprobe: ERROR: could not insert 'nvidia_352': No such device
    F0910 13:28:13.728612 10629 common.cpp:142] Check failed: error == cudaSuccess (38 vs. 0)  no CUDA-capable device is detected

I checked whether I have CUDA capable device like

lspci | grep -i nvidia
05:00.0 VGA compatible controller: NVIDIA Corporation GK107GL [Quadro K2000] (rev a1)
05:00.1 Audio device: NVIDIA Corporation GK107 HDMI Audio Controller (rev a1)

I have CUDA capable device and why I get the error?

EDIT 1: Yeah my test with ./deviceQuery failed.

../NVIDIA_CUDA-7.5_Samples/bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

I checked in the dev/ folder, I have nvidia0.

crwxrwxrwx  1 root root    195,   0 Sep 10 16:51 nvidia0
crw-rw-rw-  1 root root    195, 255 Sep 10 16:51 nvidiactl

My nvcc -V check gave me

li@li-HP-Z420-Workstation:/dev$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

Then my version check

li@li-HP-Z420-Workstation:/dev$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  352.41  Fri Aug 21 23:09:52 PDT 2015
GCC version:  gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04) 

What could be wrong?

batuman
  • 7,066
  • 26
  • 107
  • 229
  • 1
    Leaving aside caffe for a moment, are you even sure your basic CUDA installation works correctly? – talonmies Sep 10 '15 at 06:37
  • 1
    Of course, I have installed CUDA7.5. All lib and headers are installed in /usr/local/cuda7.5. All paths are exported out as export PATH=/usr/local/cuda-7.5/bin:$PATH $ export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH. I installed caffe successfully. – batuman Sep 10 '15 at 07:20
  • 2
    That isn't what I asked. Can you compile and successfully run a simple CUDA application, like one of the samples from the CUDA toolkit? – talonmies Sep 10 '15 at 07:40
  • @talonmies, yeah it failed. I updated in the EDIT. I wonder why? – batuman Sep 10 '15 at 09:30
  • 2
    Obviously your CUDA installation is broken. The internet is full of advice and instructions for installing and troubleshooting CUDA. It might be time to go and have a look at some of it. But that really isn't an on-topic question for [SO]. There are probably better places to try and get help on this (Nvidia forums, askubuntu for example) – talonmies Sep 10 '15 at 09:34
  • Yeah good idea,I should quey at Nvidia forum. Once I can solve I'll update here. – batuman Sep 10 '15 at 09:41
  • I think I have problem with CUDA7.0. I'll install CUDA6.5 and try again. – batuman Sep 10 '15 at 11:16

5 Answers5

11

Now the problem is solved. I checked sudo dpkg --list | grep nvidia I found as my kernel has 352.41, but the client has 304.12. So I did sudo apt-get remove --purge nvidia-*. It removed all packages. Then, install 352.41 as

$ sudo add-apt-repository ppa:xorg-edgers/ppa -y
$ sudo apt-get update
$ sudo apt-get install nvidia-352

After that

$ sudo dpkg --list | grep nvidia
rc nvidia-304 304.128-0ubuntu0~gpu14.04.2 amd64 NVIDIA legacy binary driver - version 304.128
rc nvidia-304-updates 304.125-0ubuntu0.0.2 amd64 NVIDIA legacy binary driver - version 304.125
ii nvidia-352 352.41-0ubuntu0~gpu14.04.1 amd64 NVIDIA binary driver - version 352.41
rc nvidia-opencl-icd-304 304.128-0ubuntu0~gpu14.04.2 amd64 NVIDIA OpenCL ICD
rc nvidia-opencl-icd-304-updates 304.125-0ubuntu0.0.2 amd64 NVIDIA OpenCL ICD
ii nvidia-opencl-icd-352 352.41-0ubuntu0~gpu14.04.1 amd64 NVIDIA OpenCL ICD
ii nvidia-prime 0.6.2 amd64 Tools to enable NVIDIA's Prime
ii nvidia-settings 355.11-0ubuntu0~gpu14.04.1 amd64 Tool for configuring the NVIDIA graphics driver

Now version matches. Then ./deviceQuery and all work as expected. Thanks

JJ Geewax
  • 10,342
  • 1
  • 37
  • 49
batuman
  • 7,066
  • 26
  • 107
  • 229
  • http://askubuntu.com/questions/723632/nvidia-7300-le-driver-15-10-how-to-install-ubuntu-modprobe-error-could-not I dunno if it's "fixed". – Wolfpack'08 Jan 21 '16 at 05:03
2

I have this problem too. And re-installing the nvidia drivers didn't solve the issue.

Finally, I solved this problem by add two kernel parameters with grub.

add in:

GRUB_CMDLINE_LINUX_DEFAULT

with:

pci=nocrs pci=realloc

I think this is a collision between cuda7.5 and kernel3.19.

Tshilidzi Mudau
  • 7,373
  • 6
  • 36
  • 49
SPWW
  • 62
  • 6
  • Also on [https://devtalk.nvidia.com/default/topic/838768/problems-with-geforce-gtx-980-on-asustek-g20aj/] nvidia forums! – Chang Hyun Park Nov 15 '15 at 06:39
  • add the `pci=nocrs pci=realloc ` in `/etc/default/grub` then `#update-grub` – Chang Hyun Park Nov 15 '15 at 06:40
  • GeForce 7 Series 7300 LE: `$ lspci -vnn | grep -i VGA -A 12 > Kernel driver in use: nvidia`. Good. But `startx` and ubuntu-desktop GUI login still result in a screen without Unity. : – Wolfpack'08 Jan 21 '16 at 05:12
0

Another way I can do is install using .run file. That needs to kill X server first. X server is killed as follow.

Make sure you are logged out.
Hit CTRL+ALT+F1 and login using your credentials.
kill your current X server session by typing sudo service lightdm stop or sudo stop lightdm
Enter runlevel 3 (or 5) by typing sudo init 3 (or sudo init 5) and install your .run file.
You might be required to reboot when the installation finishes. If not, run sudo service start lightdm or sudo start lightdm to start your X server again.

Then run .run file as sudo sh xxxxx.run

You may get error as The distribution-provided pre-install script failed! Are you sure you want to continue?. Then abort the installation and

disable the "Nouveau kernel driver" as sudo update-initramfs -u

Then reboot the system and redo stop X server, enter runlevel 3 and do sudo sh xxxx.run again.

This time you can ignore the message and continue for that prescript fail message. Then you will be able to install Nvidia Driver from .run file.

batuman
  • 7,066
  • 26
  • 107
  • 229
  • In case of Ubuntu is installed in UEFI, need to disable fast boot and secure boot, so that driver is loaded. – batuman Jul 03 '19 at 00:52
0

If you are showing video from non-nvidia device but have driver installed, you have to install it with “--no-opengl-files” flag, for Gnome to work.

I suggest to download a separate driver and install it manually by logging to console:

1. Alt Ctrl F2/f3/f4/f5 to get to console.
2. “init 3”  to kill UI
3. relogin if necessary to console
4. wget http://us.download.nvidia.com/tesla/418.67/NVIDIA-Linux-

driver x86_64-418.67.run

5. sh NVIDIA-Linux-x86_64-418.67.run --no-opengl-files
6. After installation - reboot
batuman
  • 7,066
  • 26
  • 107
  • 229
-1

I also had this problem. The above answers didn't work for me. When I installed latest driver(nvidia-364), it worked. Commands to run:

sudo add-apt-repository ppa:xorg-edgers/ppa 
sudo apt-get update 
sudo apt-get install nvidia-364

I think the problem occurs when we have different version of gcc used to compile driver modules and the Linux kernel.

Tshilidzi Mudau
  • 7,373
  • 6
  • 36
  • 49