0

I'm trying to figure out how to use nvidia-docker (https://github.com/NVIDIA/nvidia-docker) using https://docs.ansible.com/ansible/latest/docker_container_module.html#docker-container.

Problem

My current Ansible playbook execute my container using "docker" command instead of "nvidia-docker".

What I have done

According to some readings, I have tried adding my devices, without success

docker_container: name: testgpu image: "{{ image }}" devices: ['/dev/nvidiactl', '/dev/nvidia-uvm', '/dev/nvidia0', '/dev/nvidia-uvm-tools] state: started

note I tried different syntax for devices (inline ..), but still getting the same problem

This command does not throws any error. As expected it creates a Docker container with my image and try to start it.

Looking at my container logs: terminate called after throwing an instance of 'std::runtime_error' what(): No CUDA driver found

which is the exact same error I'm getting when running docker run -it <image> instead of nvidia-docker run -it <image>

Any ideas how to override docker command when using docker_container with Ansible?

I can confirm my CUDA drivers are installed, and all the path /dev/nvidia* are valid.

Thanks

PERPO
  • 3,812
  • 1
  • 13
  • 20
  • If you look closely, I already linked whatever you have said on my previous message. All your informations are on the first paragraph on the ansible doc about docker_container. – PERPO Jan 09 '18 at 06:14

1 Answers1

1

docker_container module doesn't use docker executable, it uses Docker daemon API through docker-py Python library.

Looking at nvidia-docker wrapper script, it sets --runtime=nvidia and -e NVIDIA_VISIBLE_DEVICES.

To set NVIDIA_VISIBLE_DEVICES you can use env argument of docker_container.

But I see no ways to set runtime via docker_container module as of current Ansible 2.4.
You can try to overcome this by setting "default-runtime": "nvidia" in your daemon.json configuration file, so Docker daemon will use nvidia runtime by default.

Konstantin Suvorov
  • 65,183
  • 9
  • 162
  • 193
  • Thanks for your suggestions. I tried to make my own module based on docker_container, but ultimately it fails as `runtime` is an unexpected param for api/container.py. I also tried to override the default-runtime without success. For now I' will just use Ansible's `command`, hopefully they will make "runtime" customizable in a new release – PERPO Jan 09 '18 at 17:37