I'm using the Container Optimized OS to run an application that takes advantage of GPUs. I have a separate system that creates VMs to run this application on-demand (to minimize cost) and I've been trying to reduce the time to get my application running.
To do this, I've started using a custom VM image, which at the moment is just my application's docker container being pre-downloaded and saved to the COS image. I would also like to pre-install the Nvidia drivers for the GPU, but I can't seem to get it to stick. Despite installing the drivers, verifying they work, and then creating the image when I create a new VM using that image it's like the drivers weren't installed. The files appear to all be present though. I've tried running
sudo cos-extensions install gpu
In the startup script when creating the image, but the instances created from my image throw back an error when I try to run nvidia-smi
nvidia-smi and nvidia mounting commands
sudo mount --bind /var/lib/nvidia /var/lib/nvidia
sudo mount -o remount,exec /var/lib/nvidia
/var/lib/nvidia/bin/nvidia-smi
Error:
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
Despite this complaint, the libnvidia-ml.so file DOES exist at: /var/lib/nvidia/lib64
The contents of my /var/lib/nvidia directory are:
$ ls -lh /var/lib/nvidia/
total 354M
-rw-r--r-- 1 root root 354M Mar 10 23:12 NVIDIA-Linux-x86_64-470.141.03_101-17162-40-42.cos
drwxr-xr-x 2 root root 4.0K Mar 10 23:12 bin
drwxr-xr-x 3 root root 4.0K Mar 10 23:12 bin-workdir
drwxr-xr-x 2 root root 4.0K Mar 10 23:12 drivers
drwxr-xr-x 3 root root 4.0K Mar 10 23:12 drivers-workdir
drwxr-xr-x 3 root root 4.0K Mar 10 23:12 firmware
drwxr-xr-x 4 root root 4.0K Mar 10 23:12 lib64
drwxr-xr-x 3 root root 4.0K Mar 10 23:12 lib64-workdir
-rw-r--r-- 1 root root 2.2K Mar 10 23:12 nvidia-installer.log
-rw-r--r-- 1 root root 1.2K Mar 10 23:12 pubkey.der
drwxr-xr-x 3 root root 4.0K Mar 10 23:12 share
Is there a way to create a custom image with the Nvidia driver's pre-installed that I can use?