-1

I run MAGMA testing_dgemm code both on V100 and H100 GPU. With Nsight Systems, I found that on the V100 the code doesn't use tensor cores, but code on the H100 it does.

V100 result:

Nisght Systems profiler screenshot

H100 result:

Nisght Systems profiler screenshot

The tensor core has been used in Volta GPU according to NVIDIA web.

The NVIDIA Inside Volta blog seems not to mention the FP64 TC performances.

paleonix
  • 2,293
  • 1
  • 13
  • 29
ingridli
  • 5
  • 2
  • See also [Double-Precision Tensor Cores Speed High-Performance Computing](https://blogs.nvidia.com/blog/2020/05/14/double-precision-tensor-cores/) on the developer blog about A100 tensor cores. – paleonix Aug 09 '23 at 14:03

1 Answers1

3

The v100 GPU doesn't have a FP64 (double precision) path in its TensorCore unit.

That path/capability was introduced in Ampere A100 3rd gen TensorCore.

So when performing FP64 arithmetic, V100 generally will not use TensorCore.

From here:

NVIDIA A100 introduces double precision Tensor Cores ...

(emphasis added)

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • wow...thx a lot!! this problem has troubled me for long time.. – ingridli Aug 09 '23 at 13:59
  • sorry here i have a follow up question.. I also test "testing_sgemm" code on v100. On the Nsight, it appears that no tensor core was used (the result is similar to the dgemm). But i find that NVidia claims that "Tensor Cores accelerating FP32 matrix math deliver more than 120 TFLOPS of performance". So it seems v100 has a FP32 path in the tensor core unit? – ingridli Aug 09 '23 at 14:12
  • @ingridli I think Volta Tensor cores are only for mixed precision, i.e. FP16 input. Using them here would result in different/worse results. – paleonix Aug 09 '23 at 14:18
  • that seems make sense.. thx again. i will check nvidia's blog later. – ingridli Aug 09 '23 at 14:24