How to measure the gflops of a matrix multiplication kernel?

Question

In the book Programming Massively Parallel Processors the number of gflops is used to compare the efficiency of different matrix multiplication kernels. How would I compute this for my own kernels on my own machine?

Somewhere in the NVIDIA Forums I found this 'algorithm', but I don't know, how valid it is or where the times two comes from.

NumOps = 2 * pow(MatrixSize,3)
gflops = 1.0e-9 * NumOps / ExecutionTime

p.s. please feel free to change the tags...

score 8 · Accepted Answer · answered Jul 29 '11 at 14:13

You can measure the GFLOPs by running the algorithm with a large input and measuring the execution time. Then put the execution time and matrix size into that formula. For matrix sizes big enough to keep the entire machine busy, the FLOPs is only weakly dependent on matrix size.

The GPU matrix multiplication algorithm performs the same number of floating-point operations as the naive algorithm.

for (i = 0; i < MatrixSize; i++)
  for (j = 0; j < MatrixSize; j++)
    for (k = 0; k < MatrixSize; k++)
      C[j][i] += A[j][k] * B[k][i];

There are 2 floating-point operations in the loop body, and MatrixSize * MatrixSize * MatrixSize iterations of the loop body, which gives you the formula for NumOps. GFLOPs is just the number of operations per second, divided by 10^9 ('giga').

+1. Let me just add that on hardware with a builtin FMAD (fused multiply and add) instruction, which includes all the newest nvidia gpus, people will bicker as to whether or not to use the factor of 2 in front of the MatrixSize^3 term. As long as you do it consistently it shouldn't much matter. — Jonathan Dursi, Jul 29 '11 at 17:09

How to measure the gflops of a matrix multiplication kernel?

1 Answers1