3

I have RTX2060 Nvidia graphic card which has tensor cores on it. I want to run my codel utilizing tensor cores and cuda cores in a mixed way.The idea is to have a part of the code executed by tensor cores and another part by the cuda cores, in order to get a performance speedup.

My question is: is possible to do something like that or I'm a dreamer. Thanks in advance

  • 1
    Generally this is possible. The Cuda Cores are distributed over the SMs and the 4 SM Partitions per SM. So either your kernel has to issue both types of instructions or your kernel should chose, when it starts running, what kind of instructions to issue, or you have to run different kernels at the same time and trick the GPU to distribute both of them to each SM partition. The tensor cores are quite demanding to get enough data transferred to them. So your actual bottleneck could be the speed of register file/shared memory/L1/L2/global memory. Also resources like the scheduler are shared. – Sebastian Jun 08 '22 at 16:39
  • 1
    I expect such thing to be very dependent of the target architecture. The power constraint and heat dissipation can also impact the performance in such case (the dynamic switch of more transistor may causes the chip to be hotter causing a frequency throttling, especially for tensor cores). It is hard to tell without a very specific setup or without doing a basic benchmark. Note that [this](https://www.anandtech.com/show/12673/titan-v-deep-learning-deep-dive/3) may help you to understand how some Nvidia GPU tensor-core works. – Jérôme Richard Jun 08 '22 at 22:59

1 Answers1

3

You can look at the example in the question here for how to use tensor cores in CUDA code. The only thing to add is that matrix C doesn't have to be set to 0 and doesn't need to be reused as matrix D.

So you write normal CUDA code and insert warp-level instructions like mma_sync() for involving Tensor cores in the computations. You can find the documentation on how to use Tensor cores in the normal CUDA code here.

Header and namespace:

#include <mma.h>
using namespace nvcuda;
Serge Rogatch
  • 13,865
  • 7
  • 86
  • 158