0

In Intel Xeon Phi there are 32 512-bit-wide vector registers per core. Each vector register can do 16 single precision floating point operation per cycle. And 2 operations can be done in 1 cycle (1 in the v-pipe and 1 in the u-pipe).

I want to know how many scalar multiplications can be done in 1 clock cycle apart from the vector multiplications done in the vector registers.

Boppity Bop
  • 9,613
  • 13
  • 72
  • 151
arunmoezhi
  • 3,082
  • 6
  • 35
  • 54

1 Answers1

1

Some misconceptions there. There is 1 vector unit per core. Registers store values, they do not compute. So you can issue 1 512 byte wide vector operation per cycle per core. You can do a scalar multiply in 1 cycle as well. You cannot issue both at the same time. Using the u&v pipes you can issue one vector or scalar operation and then a memory operation in the other pipe. You can do a fused multiply-add (MADD) instruction per cycle as well which effectively gives you 2 vector operations per cycle per core.

sssylvester
  • 168
  • 6
  • Thanks. Can you please share a link which says, 2 vector operations per cycle per core can be done in Xeon Phi. And when we say 2 vector operations, does it mean they both are 512 `bit` wide vector operation? – arunmoezhi Oct 14 '13 at 21:27
  • 1
    @sssylvester don't you mean FMA rather than MADD ? – damienfrancois Oct 17 '13 at 21:22