About vsubq_u16(uint16x8_t, uint16x8_t)

Question

About

vsubq_u16(uint16x8_t a, uint16x8_t b)

The return value is also uint16x8_t. Then if a is smaller than b, we will get a very large uint16x8_t instead of a negative value, it's not what I need.

If I have such requirement,

uint16_t c = fabs(uint16_t a - uint16_t b);

How can I transform to neon intrinsics? Thanks.

score 4 · Accepted Answer · answered Dec 12 '13 at 09:27

4

looks like you want the absolute difference between your inputs. If so the following intrinsic does exactly this:

uint16x8_t vabdq_u16 (uint16x8_t, uint16x8_t)

answered Dec 12 '13 at 09:27

Nils Pipenbrinck

83,631
31
151
221

score 1 · Answer 2 · answered Dec 12 '13 at 06:52

I had seen a series of questions asked by you in the neon section and I guess you are so much confused with the neon code instructions when you keep thinking much into it. Hence I shall be giving a generalised answer to the question.

Some basic knowledge to be clear before entering deep into NEON intrinsics are:

Binary representation of negative and postive numbers.
Range of unsigned char, signed char, unsigned int, signed int etc.
- Range of unsigned char -> 0 to 255
- Range of signed char -> -128 to 127

The range must always hold true while applying the instructions. As an Intrinsic code programmer, we must first know the exact range of the results that we may get.

int8x8_t c = vsub_s8(int8x8_t a, int8x8_t b)

The range of all the variables in this equation must be -128 to 127.

uint8x8_t c = vsub_u8(uint8x8_t a, uint8x8_t b)

All the variables must be in the range [0 to 255]. We will have to be sure that the result is within the range. Hence this equation works correctly only if b is less than a. In other words, if a and b are of [0,255] then c will be of [-255,255]. Clearly c cannot be represented in 8-bit representation. Here the result will have to be a 16-bit representation. vsubl_u8 will store the result in 16-bit representation.

Visualizing the arithmetic operations on base 2 numbers will help in getting closer to intrinsic code. Do your own homework in neon intrinsics by creating a test project which loads two arrays and debug the output. The intrinsics are never so complex and hence there is nothing better than a good homework. :)

Thanks for your suggestions! I have set a test project in my R&D environment, which can write small code and test quickly. However, for my case: fabs(1-2) and fabs(2-1), How can I get the correct result 1? for vsubl_u8(), we can get the result 0xffff — BonderWu, Dec 12 '13 at 08:05
Oops.. I forgot to answer the main point :P Nils has already given the answer :) — Anoop K. Prabhu, Dec 12 '13 at 11:01

About vsubq_u16(uint16x8_t, uint16x8_t)

2 Answers2