0

I would like to know how to increase the timeout limit of nvidia-docker at initialization.

When 2 or more of my 4-GPU server are busy, I always get a timeout error:

nvidia-container-cli: initialization error: driver error: timed out

when launching docker:

docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

Thank you very much in advance for your help!

f10w
  • 1,524
  • 4
  • 24
  • 39

2 Answers2

0

I don't know how to change the timeout, though you can work around this problem by starting nvidia-persistenced beforehand, which will initialize the GPU devices and keep them open, so the driver doesn't have to go through that process during docker startup.

mulad
  • 21
  • 2
0

This is not an exact answer to the question, but only a workaround to overcome the timed out error.

Before launching docker, run nvidia-smi to see which processes are running on GPUs. Disable these processes using:

kill -TSTP [pid]

Then launch docker. When done continue the previously disabled processes using:

kill -CONT [pid]
f10w
  • 1,524
  • 4
  • 24
  • 39