0

I am running a cloud instance on a gpu node. I installed CUDA and nvidia-smi showed the driver details, memory utlilization. After a couple of days, I face this error "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running".

I installed the latest driver (Nvidia-375.39 for Tesla M40 Gpus). I still face the same issue. Is there any way to i) debug why nvidia-smi is not able to communicate with the driver? ii)check if the driver is running properly.

Machavity
  • 30,841
  • 27
  • 92
  • 100
Roomba
  • 45
  • 1
  • 5
  • Please follow [this link](https://www.cyberciti.biz/faq/ubuntu-linux-install-nvidia-driver-latest-proprietary-driver/). For me installing the latest compatible nvidia-driver followed by a reboot worked. – Arvind N Aug 05 '20 at 15:11

2 Answers2

0

This is an operating system issue. The solution will depend on your operating system. For example, if you are running Ubuntu 16 the solution might be something like this:

Uninstall / purge all Nvidia drivers

sudo apt-get remove --purge nvidia* && sudo apt autoremove

Download Nvidia driver from Nvidia's website (.run file)

rigo
  • 326
  • 2
  • 9
-1

I met the same question as you, I solved this by modify the security option, the step is when I reboot the system,enter the bios,modify secure boot option as disabled,then reboot,It is ok!