First run of a nvidia-docker2 container very slow

Question

When running a GPU-enabled docker container on an EC2 p2.xlarge instance I experience a delay of between 30 and 90 seconds before the container starts running. Subsequent containers start fast (1 second delay).

The EC2 is running ubuntu 18.04 with NVIDIA driver version 396.54 and nvidia-docker2 (following the official installation guide: https://github.com/NVIDIA/nvidia-docker)

I am testing using the latest official CUDA image: docker run --rm nvidia/cuda nvidia-smi

Persinstence mode is enabled on my machine. As stated in https://github.com/NVIDIA/nvidia-docker/wiki/Frequently-Asked-Questions#how-do-i-install-the-nvidia-driver under "Why is my container slow to start with 2.0?" it should be the solution, but doesn't work for me.

Any ideas what might be causing the delay and how to fix it are appreciated.

Maybe asking the obvious, but did you pull the nvidia/cudai image before the first execution? If not, that's the expected behavior, because it has to pull the image first, they are like at least 0,5 GB. — Gabriel Miretti aka gmiretti, Sep 02 '18 at 00:29
Yes, I did pull the image first. The time needed to pull the image is not included in the 30 to 90 secods delay — lenawal, Sep 03 '18 at 07:05

score 0 · Answer 1 · answered Sep 04 '18 at 20:50

0

I see in the comments that you've already pulled the Docker image from the internet, but are you sure that the image wasn't saved to an EBS snapshot? For example, during the creation of the AMI with NVIDIA Docker, you might have pulled that image and saved it to the root AMI volume.

If that's the case, then you have this delay because of how EBS volumes are being restored from snapshots.

From AWS documentation (Initializing Amazon EBS Volumes):

... storage blocks on volumes that were restored from snapshots must be initialized (pulled down from Amazon S3 and written to the volume) before you can access the block. This preliminary action takes time and can cause a significant increase in the latency of an I/O operation the first time each block is accessed.

So when you're running your Docker container the first time, AWS is downloading the data from S3 to your EBS volume, it takes some time. The second time your container starts fast because the data is already on the volume.

answered Sep 04 '18 at 20:50

Oleg Polosin

81
1
5

Your answer seems to point to the right direction - after initializing the EBS volume, the container starts much faster. But in contrast to what you assumed, the image is downloaded after the EC2 launched and was not contained on the AMI. I also noticed that when creating an AMI from the EC2 after the container had run on it (the new AMI contains the exited container), the delay is much shorter. – lenawal Sep 07 '18 at 13:36
Are you sure you're downloading all the layers of the image? Maybe the image which you're downloading based on another image which was already downloaded to the volume. Check existing images before download using the command: `$ docker images -a`. If you have some, try to remove all of them: `$ docker rmi $(docker images -a -q)` and then download your image. – Oleg Polosin Sep 08 '18 at 16:16
There are no images on the volume. All image layers are downloaded after the instace launch. – lenawal Sep 11 '18 at 07:13

First run of a nvidia-docker2 container very slow

1 Answers1