2

I'm trying to build a roofline model for a node in a supercomputer that I'm running simulations on. The node has 2x Intel Xeon E5-2650 v2 (Ivy Bridge) 8 core 2.6 GHz processors (16 cores per node), with 64GB RAM total (4GB each). The maximum memory bandwidth for the Intel Xeon E5-2650 is shown here as 59.7 GB/s.

Achieved GFLOPS = max mem bandwidth x arithmetic intensity.

Max GFLOPS = num cores x clock frequency in GHz x ops/cycle.

My code has arithmetic intensity of 1/3 and uses double precision floating point.

Here are my calculations for calculating the peak GFLOPs for the different types of program:

  • Sequential program (single core) no vectorisation:

    • 1x2.6x1 (I assume without vectorisation, we can only achieve 1 op/cycle?) = 2.6 GFLOPs
  • Sequential program (single core) with vectorisation (SSE):

    • 1x2.6x8 = 20.8 GFLOPs
  • All cores on one Xeon with vectorisation (SSE):

    • 8x2.6x8 = 166.4 GFLOPs
  • All cores one both Xeons with vectorisation (SSE):

    • 2x 8x2.6x8 = 332.8 GFLOPs

How does the memory bandwidth available to the program change between the different types of program shown above? I know that the max memory bandwidth for 1 Xeon E5-2650 is 59.7 GB/s, however is this achieveable on a single core? Does this become 119.4 GB/s with 2 Xeon E2650s?

So would the achieved GFLOPs (using peak bandwidth x arithmetic intensity) be:

  • Sequential program w/o vectorisation:

    • 59.7 * 1/3 = 19.9 GFLOPs, however because our roofline is 2.6 GFLOPs, we are limited to 2.6 GFLOPs?
  • Sequential program with vectorisation:

    • 59.7 * 1/3 = 19.9 GFLOPs. This is achieveable because our roofline is 20.8 GFLOPs.
  • One Xeon (using all 8 cores) with vectorisation:

    • 59.7 * 1/3 = 19.9 GFLOPs. I am suspicious of this, because surely our parallel program is capable of producing more mem reqs than the sequential program, and surely the sequential program doesn't saturate the memory system?
  • Two Xeons (total of 16 cores) with vectorisation:

    • 119.4 * 1/3 = 39.8 GFLOPs.

I feel like something is wrong with the achieved GFLOPs, have I made a mistake somewhere?

JC2188
  • 337
  • 3
  • 17

0 Answers0