In the book Programming Massively Parallel Processors
the number of gflops is used to compare the efficiency of different matrix multiplication kernels. How would I compute this for my own kernels on my own machine?
Somewhere in the NVIDIA Forums I found this 'algorithm', but I don't know, how valid it is or where the times two comes from.
NumOps = 2 * pow(MatrixSize,3)
gflops = 1.0e-9 * NumOps / ExecutionTime
p.s. please feel free to change the tags...