Objective
Understand how GPU will be utilized in Metaflow.
Background
As in Documentation / Explanation on how to use GPU #250, there are several discussions on how to use GPU.
It looks @resources(GPU=2)
looks after the GPU allocation, but there are some discussions stating AWS EC2 instance type with GPU such as P or G instances, also the type of AMI.
In my understanding, Metaflow uses AWS batch which uses ECS/Docker. Then the docker instances need GPU drivers inside docker containers (NVIDIA Container Toolkit) to be able to access GPU.
Hence it is not clear how Metaflow manages the GPU and what are the prerequisites, configurations and coding required to be able to use GPU.
Questions
- Do we need to use a specific AMI in which GPU drivers have been pre-configured in the EC2 instance (docker host) level?
- Do we need to use a specific EC2 type with GPU (P3, P4, G3, G4, INF1)? Or does Metaflow uses a service e.g. AWS Elastic Inference to dynamically allocate GPU even if the Batch/ECS EC2 instances do not have GPU?
- Does Metaflow looks after installing GPU drivers inside a docker container or uses NVIDIA Container Toolkit internally?
- Does @resource is all we need in the Python code to utilize GPU?