As we can see here "arm integer NEON operations cycles " and arm float NEON operations cycles ,the integer Multiply operations does not seem to have a definite advantage over the Floating point Multiplication operations. When I converted my floating point code to fixed point, I had to add additional "shift "instruction after fixed point multiplication/division instructions. The cycles required for the program actually increased due to increase in the instructions. The performance of my program deteriorated due to Fixed point. (14000 -cycles for floating point code, 26000-cycles for fixed point code).
Are there any special instructions dedicated NEON to fixed point operations(Multiplications and divisions) ? I only found one instruction that just converts Fixed -float and otherwise. Is there any efficient way of writing fixed point programs in NEON?
I wrote the following sample code for floating point code.
VMUL Q14.F32,Q8.F32,Q2.F32
VMUL Q15.F32,Q8.F32,Q3.F32
VLD2 {Q10.F32,Q11.F32},[pTw2@256],TwdStep
VLD2 {Q4.F32,Q5.F32},[pT1@256],fftSize
VMLA Q14.F32,Q9.F32,Q3.F32
VMLS Q15.F32,Q9.F32,Q2.F32
The following code was converted to Fixed point code by inserting shift operations after VMUL A instructions.
VMUL Q14.S32,Q8.S32,Q2.S32
VMUL Q15.S32,Q8.S32,Q3.S32
VLD2 {Q10.S32,Q11.S32},[pTw2@256],TwdStep
VLD2 {Q4.S32,Q5.S32},[pT1@256],fftSize
VMLA Q14.S32,Q9.S32,Q3.S32
VMLS Q15.S32,Q9.S32,Q2.S32
VRSHR Q14.S32,Q14.S32,#12 ;Shift instructions to account for fixed point
VRSHR Q15.S32,Q15.S32,#12 ;