Trying to assess the performance gain from an embedded architecture I tried to search for the number of floating point multiplies that can be performed in a cycle on a single core of the Core 2 and Core i7 architectures, but could not find a quick answer to that. Unfortunately I am not familiar with the ISA so I cannot tell that from looking at the respective instructions. I assume it would be some kind of a SIMD instruction. Any idea?
Asked
Active
Viewed 423 times
1 Answers
3
One thing: Core 2 is not Intel's latest architecture. That would be Sandy Bridge.
Core 2 and Core i7 Nehalem, can sustain 1 SSE multiply/cycle. Each SSE instruction can handle up to 4 single-precision or 2 double-precision. So that's 2 DP or 4 SP floating-point multiplies per cycle.
Core i7 Sandy Bridge can sustain 1 AVX multiply/cycle. AVX is double the size of SSE. So that's 4 DP or 8 SP floating-point multiplies per cycle.

Mysticial
- 464,885
- 45
- 335
- 332
-
Is it safe to assume that current AMD processors offer the same performance? – ysap Nov 11 '11 at 01:56
-
Correct. I think all AMD processors since the K10 architecture have had the same SSE throughputs. (1 SSE multiply/cycle) For the new Bulldozer architecture, it's a little more complicated than that due the shared FPU between each "Bulldozer Module". – Mysticial Nov 11 '11 at 01:59