0

I want to calculate theoretical speed up for my algorithm for some Neural Network and I want to know the performance ratios of Multiplication, Addition, FMA(Fused Multiplication Addition) and, Binary Operations. I got to know that ratio (Multiplication accumulate operation) and a binary operation(64-bit) can be taken as 1.91 from here.

I would like to know reasonable ratios for all these operations may be on general CPU, GPU!! And this wiki page mentions that for Intel Core CPUs we have: 8 SP FLOPS/cycle with 4-wide SSE addition + 4-wide SSE multiplication.

So can I say if the addition operations and multiplication operations will take equal time(in isolation) and also Multiply Accumulate operation will take the same time as them?

Kaivalya Swami
  • 91
  • 1
  • 12
  • 2
    On mainstream Intel (Skylake and later), and on Xeon Phi, FP mul/add/FMA all run on the same execution unit with the same throughput and latency, so the theoretical performance ratio is 1:1:1. See https://agner.org/optimize/ and other performance links in https://stackoverflow.com/tags/x86/info. – Peter Cordes Jul 04 '19 at 03:14
  • Broadwell had 5 cycle FMA vs. 3 cycle mul or add latency, and FP add throughput was half of mul/FMA. – Peter Cordes Jul 04 '19 at 03:16

0 Answers0