I use custom images (AMIs) configured for machine learning on GPU-enabled EC2 instances.
This means cuda
, libcudnn6
, nvidia-docker
etc are all properly setup on them.
However when Kops starts new nodes from these AMIs (I use cluster-autoscaler) it overrides my properly setup docker.
How can I prevent that?
For now I run a custom script on startup that re-installs nvidia-docker
properly, but that's obviously not ideal.