I am trying to calculate the following using neon in assembly ((200*(53-255))/255) + 255 whose result should equal approx 97
I've tested here http://szeged.github.io/nevada/ and also on a dual-core Cortex-A7 ARM CPU tablet. And the result is 243 which is not correct.
How should I implement this to get the correct result of 97?
d2 contains 200,200,200,200,200,200,200,200
d4 contains 255,255,255,255,255,255,255,255
d6 contains 53,53,53,53,53,53,53,53
vsub.s8 d8, d6, d4 (53 - 255 results in d8 = 54,54,54,54,54,54,54,54)
vmull.s8 q5,d8,d2 (54 * 200 results in q5 = 244,48,244,48,244,48,244,48,244,48,244,48,244,48,244,48)
vshrn.s16 d12, q5, #8 (divide by 255 results in d12 = 244,244,244,244,244,244,244,244)
vadd.s8 d5, d4, d12 (final result d5 = 243,243,243,243,243,243,243,243)