An FMA operation (A*B + C) can be done in 5 cycles on intel's Haswell architecture. Can anyone explain what happens in each of the 5 cycles? For a Multiply I know that the stages are as follows:
- Seperate Mantissa and Exponent
- Multiply Mantissa
- Multiply Exponent
- Normalize the results
- Insert Sign
But I have been unable to find the pipeline stages for the FMA operation
Edit: It seems that the above is not the actual method used in the pipeline for Mult (Thanks to @EOF, and @harold for the headsup)