16

The NVIDIA-SMI is throwing this error:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

I purged NVIDIA and installed it again following steps mentioned here.

My device specs are as follows:

  • Server with a Tesla M40
  • Running on Ubuntu 16.04
  • Kernel version Linux 4.4.0-116-generic x86_64
  • Driver: nvidia-384

Can someone please help in solving the error?

Zoe
  • 27,060
  • 21
  • 118
  • 148

5 Answers5

9

The issue might due to a confirmed "bug" in 4.4.0-116 patch. I ran into the same issue with nvidia-390. If you still want to use a newer version of Nvidia-driver, I followed the instructions here and managed to solve the problem. In general, use the following steps:

  1. If you cannot login to the desktop and fall into to the fail-loop, press ctrl + alt + F1 to login into the command line mode.
  2. Check if the version of gcc is outdated, if so, update it: gcc --version
  3. If the gcc version is 5+, uninstall the nvidia driver first: sudo apt-get remove nvidia-390
  4. Purge the 4.4.0-116 kernel: sudo apt-get purge linux-headers-4.4.0-116 linux-headers-4.4.0-116-generic linux-image-4.4.0-116-generic linux-image-extra-4.4.0-116-generic linux-signed-image-4.4.0-116-generic
  5. Reinstall the kernel: sudo apt-get install linux-generic linux-signed-generic
  6. Reinstall the nvidia-390: sudo apt-get install nvidia-390
  7. Check if the problem is solved by modinfo nvidia-390 -k 4.4.0-116-generic | grep vermagic, make sure retpoline shows up this time
  8. Reboot: sudo reboot

Hope this works for you and other people who run into the same issue. The post in the forum saved my weekend.

Rex Wang
  • 368
  • 2
  • 5
  • I'm getting error at step 7. modinfo: ERROR: Module alias nvidia-387 not found. aerin@capa:~$ libkmod: ERROR ../libkmod/libkmod.c:586 kmod_search_moddep: could not open moddep file '/lib/modules/4.4.0-116-generic/modules.dep.bin' – aerin Jun 16 '18 at 07:50
  • 1
    Sorry, I didn't make it clear, that step's command should depend on your version of NVIDIA driver, I forgot to change the version from the other post, edited now. – Rex Wang Jun 26 '18 at 14:05
6

Note: this answer is from 2018 and works for Ubuntu 16.04, which is very much out-of-date. Don't try this on recent Ubuntu versions.

Try

  1. Download the driver from here
  2. sudo apt-get purge nvidia* - To remove your current installations
  3. dpkg -i nvidia-diag-driver-local-repo-ubuntu1604_375.66-1_amd64.deb - installing what you downloaded earlier
  4. sudo apt-get update
  5. sudo apt-get install cuda-drivers

After this, go on and reboot your computer. When it's up again, the nvidia-smi command should run smoothly

TamaMcGlinn
  • 2,840
  • 23
  • 34
bluesummers
  • 11,365
  • 8
  • 72
  • 108
2

to download latest driver as of this answer:

    sudo apt install libnvidia-compute-435 libnvidia-compute-435
    sudo apt install libnvidia-gl-435 nvidia-dkms-435 nvidia-kernel-source-435         
    nvidia-utils-435 xserver-xorg-video-nvidia-435 libnvidia-ifr1-435 
    sudo apt install nvidia-driver-435
    sudo reboot

and then:

    nvidia-smi
Mike Beck
  • 350
  • 1
  • 8
0

If you're running this on Google Colab, just go to Runtime > Change Runtime Type > select GPU. That worked for me.

vine_J
  • 123
  • 6
0

Duplicate topic.

There is no single problem responsible for this message, so there is no single solution.

This Link has more solutions and is older:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver

Ehsan Paknejad
  • 154
  • 1
  • 7