I have a numeric intensive application and after looking for GFLOPS on the internet, I decided to do my own little benchmark. I just did a single thread matrix multiplication thousands of times to get about a second of execution. This is the inner loop.full
for (int i = 0; i < SIZEA; i++)
for (int j = 0; j < SIZEB; j++)
vector_out[i] = vector_out[i] + vector[j] * matrix[i, j];
It's been years since I dealt with FLOPS, so I expected to get something around 3 to 6 cycles per FLOP. But I am getting 30 (100 MFLOPS), surely if I parallelize this I will get more but I just did not expect that. Could this be a problem with dot NET. or is this really the CPU performance?
Here is a fiddle with the full benchmark code.
EDIT: Visual studio even in release mode takes longer to run, the executable by itself it runs in 12 cycles per FLOP (250 MFLOPS). Still is there any VM impact?