Running Triton Server Inference on AWS GPU Graviton instance

Question

I am currently running a Triton server in production on AWS Cloud using a standard GPU enabled EC2 (very expensive).

I have seen these new GPU enabled Graviton instances can be 40% cheaper to run. However, they run on ARM (not AMD). Does this mean I can run the standard version of Triton server on this instance?

Looking at Triton server release notes, I have seen it can run on jetson nano, which is nvidia gpu ARM https://github.com/triton-inference-server/server/releases/tag/v1.12.0

Does this method reduce my costs? Can I run Triton server on these graviton instances?

Does performance drop using these instances?

Geoffrey Blake · Answer 1 · 2022-12-07T17:10:50.943

0

Looking at Nvidia's NGC container repository there are containers built for Arm64 for the most recent version. On the surface it appears it should work on G5g. I would recommend trying the container and testing if it suits your needs. Without testing your specific workload, it is impossible to know up front what the performance would be and by extension if its cheaper.

edited Dec 07 '22 at 17:10

answered Dec 06 '22 at 22:16

Geoffrey Blake

169
3

Running Triton Server Inference on AWS GPU Graviton instance

1 Answers1