3

I just received this message on my Ubuntu server:

Failed to initialize NVML: Driver/library version mismatch

when typing

watch nvidia-smi

I am running ubuntu server (Ubuntu 18.04.5 LTS), and everything was working correctly yesterday.

So the question is, did ubuntu server automatically update the Nvidia driver without asking me for permission? How would I confirm this? I don't want automatic updates!

I realize that a reboot will fix this, but this is server doing lots of other stuff, so rebooting in the middle of the week is not allowed :)

thanks!

vgoklani
  • 10,685
  • 16
  • 63
  • 101
  • I had the same issue w/ Ubuntu 20.04. Rebooting worked for me too. – vpap May 17 '21 at 15:35
  • 1
    Rebooting is a definite fix, but I don't like being forced to reboot as other processes have to be stopped and then restarted. – vgoklani May 17 '21 at 19:05

2 Answers2

3

I have met the same problem. I solved it by doing these: 1> check the version of nvidia-driver with command: cat /proc/driver/nvidia/version 2> check if the driver has been upgraded:cat /var/log/dpkg.log|grep nvidia 3> in my computer, it has been upgraded from 415 to 418. 4> need to reinstall nvidia-driver and hold the version with command apt mark nvidia-415

Dc_Neo
  • 101
  • 1
  • 4
  • 1
    How is this "solved"? Did your machine just randomly decide to upgrade the driver? – vgoklani Nov 16 '21 at 13:07
  • no , i updated some other software, the dependency system updated it , you can check that or just use command 'apt-hold XXX'. – Dc_Neo Nov 22 '21 at 01:54
0

I got the same issue and resolved it by installing the right GPU driver.

You need to install cuda toolkit and cudnn. Please refer to the official doc for detail. The driver will be installed automatically.

Note: the latest version is 12.1 which might not be compatible with latest torch and you will have to build it on your own.

Or you need to install the 11.x version.

4t8dds
  • 565
  • 7
  • 19