I'm trying to build a roofline model for a node in a supercomputer that I'm running simulations on. The node has 2x Intel Xeon E5-2650 v2 (Ivy Bridge) 8 core 2.6 GHz processors (16 cores per node), with 64GB RAM total (4GB each). The maximum memory bandwidth for the Intel Xeon E5-2650 is shown here as 59.7 GB/s.
Achieved GFLOPS = max mem bandwidth x arithmetic intensity.
Max GFLOPS = num cores x clock frequency in GHz x ops/cycle.
My code has arithmetic intensity of 1/3 and uses double precision floating point.
Here are my calculations for calculating the peak GFLOPs for the different types of program:
Sequential program (single core) no vectorisation:
- 1x2.6x1 (I assume without vectorisation, we can only achieve 1 op/cycle?) = 2.6 GFLOPs
Sequential program (single core) with vectorisation (SSE):
- 1x2.6x8 = 20.8 GFLOPs
All cores on one Xeon with vectorisation (SSE):
- 8x2.6x8 = 166.4 GFLOPs
All cores one both Xeons with vectorisation (SSE):
- 2x 8x2.6x8 = 332.8 GFLOPs
How does the memory bandwidth available to the program change between the different types of program shown above? I know that the max memory bandwidth for 1 Xeon E5-2650 is 59.7 GB/s, however is this achieveable on a single core? Does this become 119.4 GB/s with 2 Xeon E2650s?
So would the achieved GFLOPs (using peak bandwidth x arithmetic intensity) be:
Sequential program w/o vectorisation:
- 59.7 * 1/3 = 19.9 GFLOPs, however because our roofline is 2.6 GFLOPs, we are limited to 2.6 GFLOPs?
Sequential program with vectorisation:
- 59.7 * 1/3 = 19.9 GFLOPs. This is achieveable because our roofline is 20.8 GFLOPs.
One Xeon (using all 8 cores) with vectorisation:
- 59.7 * 1/3 = 19.9 GFLOPs. I am suspicious of this, because surely our parallel program is capable of producing more mem reqs than the sequential program, and surely the sequential program doesn't saturate the memory system?
Two Xeons (total of 16 cores) with vectorisation:
- 119.4 * 1/3 = 39.8 GFLOPs.
I feel like something is wrong with the achieved GFLOPs, have I made a mistake somewhere?