I want to calculate theoretical speed up for my algorithm for some Neural Network and I want to know the performance ratios of Multiplication, Addition, FMA(Fused Multiplication Addition) and, Binary Operations. I got to know that ratio (Multiplication accumulate operation) and a binary operation(64-bit) can be taken as 1.91 from here.
I would like to know reasonable ratios for all these operations may be on general CPU, GPU!! And this wiki page mentions that for Intel Core CPUs we have: 8 SP FLOPS/cycle with 4-wide SSE addition + 4-wide SSE multiplication.
So can I say if the addition operations and multiplication operations will take equal time(in isolation) and also Multiply Accumulate operation will take the same time as them?