2

I don't understand how I differentiate between vbit, vbsl and vbif with neon intrinsics. I need to do the vbit operation but if I use the vbslq instruction from the intrinsics I don't get what I want.

For example I have a source vector like this:

uint8x16_t source = 39 62 9b 52 34 5b 47 48 47 35 0 0 0 0 0 0

The destination vector is:

uint8x16_t destination = 0 0 0 0 0 0 0 0 0 0 0 0 c3 c8 c5 d5

I would like to have as an output this:

39 62 9b 52 34 5b 47 48 47 35 0 0 c3 c8 c5 d5

meaning that I want to copy the first ten bytes from the source and leave the other 6 unchanged. I'm using this mask:

{0,0,0,0,0,0,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF};

What is the correct way to use the vbslq_u8?

Paul R
  • 208,748
  • 37
  • 389
  • 560
user1926328
  • 147
  • 2
  • 10
  • Which intrinsic are you you using and what do you want to it to do ? Maybe you could post the relevant section of your code as it is now and explain what you need to happen ? – Paul R Sep 13 '13 at 11:15
  • I need to do the same exact thing of this http://stackoverflow.com/questions/18312814/arm-neon-conditional-store-suggestion . I tried to do as it's said in the answer but the result I get is not what I want. The instruction I use is the vbslq_u8 but I don't understand what it does exactly. – user1926328 Sep 13 '13 at 11:21
  • OK - so post the code, also post an example of the input data you are passing to the intrinsic, what you expect the output data to be, and what the actual data is. – Paul R Sep 13 '13 at 11:23
  • @PaulR . I edited my answer with all the info. – user1926328 Sep 13 '13 at 11:53

1 Answers1

6

The ARM documentation is not very clear, but it looks like you would need to use the intrinsic like this:

uint8x16_t src =  {0x39,0x62,0x9b,0x52,0x34,0x5b,0x47,0x48,
                   0x47,0x35,0x00,0x00,0x00,0x00,0x00,0x0};
uint8x16_t dest = {0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
                   0x00,0x00,0x00,0x00,0xc3,0xc8,0xc5,0xd5};
uint8x16_t mask = {0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,
                   0xff,0xff,0x00,0x00,0x00,0x00,0x00,0x00};

dest = vbslq_u8(mask, src, dest);

Note that order of bytes in the mask needs to correspond with the order in the source/dest registers (they seem to be swapped in your question ?).

Also note that the first param to the intrinsic appears to be the selection mask, where a 1 bit selects the corresponding bit from the second param and a 0 bit selects the corresponding bit from the third param.

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • 1
    It works. The problem was that I had to put the mask as the first parameter and I wasn't doing that. Also the mask had to be swapped.Now it's everything ok. Thank you. – user1926328 Sep 13 '13 at 14:44
  • Yes, the weird thing about the `vbsl` instruction is that the output register is also the mask input register, hence the slightly unexpected parameter ordering in the corresponding intrinsic. – Paul R Sep 13 '13 at 14:46
  • 1
    The ordering matches the C conditional operator. Think of vbsl(a,b,c) like the C expression (a ? b : c), but on individual bits. – Al Grant Sep 19 '13 at 23:15