Where does the third dimension (as in 4x4x4) of tensor cores come from?

Question

As I understand, the Nvidia tensor cores multiplies two 4x4 matrices and adds the result to a third matrix. Multiplying two 4x4 matrices produces a 4x4 matrix, and adding two 4x4 matrices produces a 4x4 matrix. Still "Each Tensor Core provides a 4x4x4 matrix processing array".

There are 4x multiplication-accumulate operations that are needed for each row*col. I thought maybe the last x4 comes from intermediate result before the accumulation, but I don't think it quite fits with the description on Nvidias pages.

"The FP16 multiply results in a full precision result that is accumulated in FP32 operations with the other products in a given dot product for a 4x4x4 matrix multiply, as Figure 9 shows." https://developer.nvidia.com/blog/cuda-9-features-revealed/

4x4x4 matrix multiply? I thought matrices was 2dimensions by definition.

Can someone please explain where the last x4 comes from?

score 1 · Accepted Answer · edited Jul 13 '22 at 18:47

1

4x4x4 is just the notation for multiplication of one 4x4 matrix with another 4x4 matrix.

If you were to multiply a 4x8 matrix with a 8x4 matrix, you would have 4x8x4. So if A is NxK and B is KxM, then it can be referred to as a NxKxM matrix multiply.

I just briefly looked up and found this paper, where they use this exact notation (e.g. in Section 4.6 on page 36): https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/153863/eth-6705-01.pdf

edited Jul 13 '22 at 18:47

Alfred

5
4

answered Jul 12 '22 at 11:30

M. Steiner

161
7

Thanks! I don't have enough reputation to upvote, but ill mark it as accepted answer. – Alfred Jul 12 '22 at 17:21
Probably makes more sense in a hardware pipeline than when doing it by hand on paper. Then you normally add things up before you continue to the next. Atleast don't recall seeing it before now. – Alfred Jul 12 '22 at 17:23

score 0 · Answer 2 · answered Jul 11 '22 at 12:34

0

The cube itself represents the 64 element-wise products required to generate the full 4x4 product matrix" cvw.cac.cornell.edu/GPUarch/tensor_cores. It is the intermediate products before accumulation that make up the last x4.

answered Jul 11 '22 at 12:34

Alfred

5
4

Where does the third dimension (as in 4x4x4) of tensor cores come from?

2 Answers2