Which GPU should I use on Google Cloud Platform (GCP)

Question

Right now, I'm working on my master's thesis and I need to train a huge Transformer model on GCP. And the fastest way to train deep learning models is to use GPU. So, I was wondering which GPU should I use among the ones provided by GCP? The ones available at the current moment are:

NVIDIA® A100
NVIDIA® T4
NVIDIA® V100
NVIDIA® P100
NVIDIA® P4
NVIDIA® K80

score 18 · Accepted Answer · edited Oct 22 '21 at 23:28

It all depends on what are the characteristics you're looking for.

First, let's collect some information about these different GPU models and see which one suits you best. You can google each model's name and see its characteristics. I did that and I created the following table:

Model	FP32 (TFLOPS)	Price	TFLOPS/dollar
Nvidia A100	19.5	2.933908	6.646425178
Nvidia Tesla T4	8.1	0.35	23.14285714
Nvidia Tesla P4	5.5	0.6	9.166666667
Nvidia Tesla V100	14	2.48	5.64516129
Nvidia Tesla P100	9.3	1.46	6.369863014
Nvidia Tesla K80	8.73	0.45	19.4

In the previous table, you see can the:

FP32: which stands for 32-bit floating point which is a measure of how fast this GPU card with single-precision floating-point operations. It's measured in TFLOPS or *Tera Floating-Point Operations... The higher, the better.
Price: Hourly-price on GCP.
TFLOPS/Price: simply how much operations you will get for one dollar.

From this table, you can see:

Nvidia A100 is the fastest.
Nvidia Tesla P4 is the slowest.
Nvidia A100 is the most expensive.
Nvidia Tesla T4 is the cheapest.
Nvidia Tesla T4 has the highest operations per dollar.
Nvidia Tesla V100 has the lowest operations per dollar.

And you can observe that clearly in the following figure:

I hope that was helpful!

TL;DR : A100 GPUs are great if you can afford them. Otherwise, T4 GPUs offer the best bang for buck. — Vibhansh, Nov 02 '21 at 09:00

score 0 · Answer 2 · answered Jul 01 '22 at 02:36

Nvidia says that using the most modern, powerful GPUs is not only faster, it also ends up being cheaper: https://developer.nvidia.com/blog/saving-time-and-money-in-the-cloud-with-the-latest-nvidia-powered-instances/

Google came to a similar conclusion (this was a couple of years ago before the A100 was available): https://cloud.google.com/blog/products/ai-machine-learning/your-ml-workloads-cheaper-and-faster-with-the-latest-gpus

I guess you could make an argument that both Nvidia and Google could be a little biased in making that judgement, but they are also well placed to answer the question and I see no reason not to trust them.

Which GPU should I use on Google Cloud Platform (GCP)

2 Answers2