Metaflow explanation on how GPU is utilized

Question

Objective

Understand how GPU will be utilized in Metaflow.

Background

As in Documentation / Explanation on how to use GPU #250, there are several discussions on how to use GPU.

It looks @resources(GPU=2) looks after the GPU allocation, but there are some discussions stating AWS EC2 instance type with GPU such as P or G instances, also the type of AMI.

In my understanding, Metaflow uses AWS batch which uses ECS/Docker. Then the docker instances need GPU drivers inside docker containers (NVIDIA Container Toolkit) to be able to access GPU.

Hence it is not clear how Metaflow manages the GPU and what are the prerequisites, configurations and coding required to be able to use GPU.

Questions

Do we need to use a specific AMI in which GPU drivers have been pre-configured in the EC2 instance (docker host) level?
Do we need to use a specific EC2 type with GPU (P3, P4, G3, G4, INF1)? Or does Metaflow uses a service e.g. AWS Elastic Inference to dynamically allocate GPU even if the Batch/ECS EC2 instances do not have GPU?
Does Metaflow looks after installing GPU drivers inside a docker container or uses NVIDIA Container Toolkit internally?
Does @resource is all we need in the Python code to utilize GPU?

score 1 · Answer 1 · answered Jul 17 '20 at 06:44

To use GPU in Metaflow.

Need a DL AMI or custom AMI which has NVIDIA driver installed. Use AWS DL AMI.
Make sure AWS batch has EC2 p2/p3 instances in the compute environment configuration.
Metaflow does not look after GPU driver and library, but metaflow configure aws allows which docker image to use. Specify the docker image (ECR repository) which has NVIDIA driver and CUDA are installed. Use AWS Deep Learning Container.
Use @Batch and specif GPU attribute.

Details in the Github issue.

Metaflow explanation on how GPU is utilized

Objective

Background

Questions

1 Answers1