0

I have a Nvidia A30 linux instance having a single 24 GB GPU, and I am planning to host 3 similar APIs on the same instance. These APIs are containers from the same docker image. I have exposed these 3 containers to access the GPU using Nvidia Container Toolkit, and as expected I am able to get the desired outputs from these container APIs.

The problem lies here: When only one container is up, and it receives a request, the GPU performs at its maximum capacity. But if I up the remaining 2 containers and they too start receiving simultaneous requests, the GPU performance literally gets divided by 3.

I have tried to set --shm-size and --memory in docker run command to different settings but to no avail.

Can someone help out with this?

talonmies
  • 70,661
  • 34
  • 192
  • 269
Aman Jain
  • 11
  • 6
  • In order to troubleshoot this, can you try to run these Docker containers as privileged? `docker run --privileged` but no `--gpu` option. Please note you need to trust these containers given that `--privileged` grants a container access to your whole system. – Anis Ladram Mar 29 '23 at 20:52

0 Answers0