1

Trying to assess the performance gain from an embedded architecture I tried to search for the number of floating point multiplies that can be performed in a cycle on a single core of the Core 2 and Core i7 architectures, but could not find a quick answer to that. Unfortunately I am not familiar with the ISA so I cannot tell that from looking at the respective instructions. I assume it would be some kind of a SIMD instruction. Any idea?

ysap
  • 7,723
  • 7
  • 59
  • 122

1 Answers1

3

One thing: Core 2 is not Intel's latest architecture. That would be Sandy Bridge.

Core 2 and Core i7 Nehalem, can sustain 1 SSE multiply/cycle. Each SSE instruction can handle up to 4 single-precision or 2 double-precision. So that's 2 DP or 4 SP floating-point multiplies per cycle.

Core i7 Sandy Bridge can sustain 1 AVX multiply/cycle. AVX is double the size of SSE. So that's 4 DP or 8 SP floating-point multiplies per cycle.

Mysticial
  • 464,885
  • 45
  • 335
  • 332
  • Is it safe to assume that current AMD processors offer the same performance? – ysap Nov 11 '11 at 01:56
  • Correct. I think all AMD processors since the K10 architecture have had the same SSE throughputs. (1 SSE multiply/cycle) For the new Bulldozer architecture, it's a little more complicated than that due the shared FPU between each "Bulldozer Module". – Mysticial Nov 11 '11 at 01:59