How to increase the timeout limit of nvidia-docker initialization

Question

I would like to know how to increase the timeout limit of nvidia-docker at initialization.

When 2 or more of my 4-GPU server are busy, I always get a timeout error:

nvidia-container-cli: initialization error: driver error: timed out

when launching docker:

docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

Thank you very much in advance for your help!

score 0 · Answer 1 · answered Jul 02 '18 at 15:56

I don't know how to change the timeout, though you can work around this problem by starting nvidia-persistenced beforehand, which will initialize the GPU devices and keep them open, so the driver doesn't have to go through that process during docker startup.

score 0 · Accepted Answer · answered Jul 02 '18 at 16:57

This is not an exact answer to the question, but only a workaround to overcome the timed out error.

Before launching docker, run nvidia-smi to see which processes are running on GPUs. Disable these processes using:

kill -TSTP [pid]

Then launch docker. When done continue the previously disabled processes using:

kill -CONT [pid]

How to increase the timeout limit of nvidia-docker initialization

2 Answers2