I am running a cloud instance on a gpu node. I installed CUDA and nvidia-smi showed the driver details, memory utlilization. After a couple of days, I face this error "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running".
I installed the latest driver (Nvidia-375.39 for Tesla M40 Gpus). I still face the same issue. Is there any way to i) debug why nvidia-smi is not able to communicate with the driver? ii)check if the driver is running properly.