Why does the magma_dgemm function not use tensor cores on the V100 GPU?

Question

I run MAGMA testing_dgemm code both on V100 and H100 GPU. With Nsight Systems, I found that on the V100 the code doesn't use tensor cores, but code on the H100 it does.

V100 result:

H100 result:

The tensor core has been used in Volta GPU according to NVIDIA web.

The NVIDIA Inside Volta blog seems not to mention the FP64 TC performances.

See also [Double-Precision Tensor Cores Speed High-Performance Computing](https://blogs.nvidia.com/blog/2020/05/14/double-precision-tensor-cores/) on the developer blog about A100 tensor cores. — paleonix, Aug 09 '23 at 14:03

score 3 · Accepted Answer · answered Aug 09 '23 at 13:55

3

The v100 GPU doesn't have a FP64 (double precision) path in its TensorCore unit.

That path/capability was introduced in Ampere A100 3rd gen TensorCore.

So when performing FP64 arithmetic, V100 generally will not use TensorCore.

From here:

NVIDIA A100 introduces double precision Tensor Cores ...

(emphasis added)

answered Aug 09 '23 at 13:55

Robert Crovella

143,785
11
213
257

wow...thx a lot!! this problem has troubled me for long time.. – ingridli Aug 09 '23 at 13:59
sorry here i have a follow up question.. I also test "testing_sgemm" code on v100. On the Nsight, it appears that no tensor core was used (the result is similar to the dgemm). But i find that NVidia claims that "Tensor Cores accelerating FP32 matrix math deliver more than 120 TFLOPS of performance". So it seems v100 has a FP32 path in the tensor core unit? – ingridli Aug 09 '23 at 14:12
@ingridli I think Volta Tensor cores are only for mixed precision, i.e. FP16 input. Using them here would result in different/worse results. – paleonix Aug 09 '23 at 14:18
that seems make sense.. thx again. i will check nvidia's blog later. – ingridli Aug 09 '23 at 14:24

Why does the magma_dgemm function not use tensor cores on the V100 GPU?

1 Answers1