1

I have asked a question for vclt_s8 comparation. Does anybody know how to use Neon intrinsics uint8x8_t vclt_s8 (int8x8_t, int8x8_t)

However, if we have such code:

if(a > b + c) {
    a = b + c;
} else if(a < b - c) {
    a = b - c;
}

How can I transform it to Neon intrinsics? It seems that we can not do 8 operator parallel operation in such case. Isn't it?

Community
  • 1
  • 1
BonderWu
  • 133
  • 1
  • 10

1 Answers1

5

Obviously you can't do branching with SIMD, so you have to look at how do implement this kind of logic in a branchless way, using masks. I'll just give pseudo code, so you get the general idea - coding this should be fairly straightforward:

bc = b + c       ; get `(b + c)` in a vector register
mask = a > bc    ; use compare instruction to generate mask (-1 = true, 0 = false)
bc = bc & mask   ; use bitwise AND to zero out elements of `(b + c)` which we do not want
a = a & ~mask    ; use bitwise ANDC to zero out elements of `a` which we do not want
a = a | bc       ; combine required elements into `a` using bitwise OR

bc = b - c       ; get `(b - c)` in a vector register
mask = a < bc    ; use compare instruction to generate mask (-1 = true, 0 = false)
bc = bc & mask   ; use bitwise AND to zero out elements of `(b - c)` which we do not want
a = a & ~mask    ; use bitwise ANDC to zero out elements of `a` which we do not want
a = a | bc       ; combine required elements into `a` using bitwise OR

Note that I've cheated a little here and omitted the else from your scalar code (assuming that the two branches are mutually exclusive) so what I've implemented is actually equivalent to:

if (a > b + c) {
    a = b + c;
}
if (a < b - c) {
    a = b - c;
}

If this is a bad assumption then you'll need to do some additional bitwise operations to implement the logical else.

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • Let a = 2, b = 2, c = -1... You'll need to handle that else properly! (OP is using 's8' intrinsics so I guess signed is important to them) – James Greenhalgh Dec 09 '13 at 09:57
  • @James: not necessarily - it depends on the use case - from the context of image/signal processing my guess is that `c` is always positive - if that's not the case however it's easy enough to add a few more bitwise instructions to implement the `else`, as I said above, but you don't want to do this and sacrifice performance if it's not needed. – Paul R Dec 09 '13 at 10:36
  • mask is signed or unsigned? int8x8_t mask; or uint8x8_t mask; – BonderWu Dec 09 '13 at 10:52
  • @BonderWu: it really doesn't matter - each element is all 1s for "true" (think of this as 255 or -1, whichever you prefer) and all 0s for "false". See the accepted answer to your [previous question](http://stackoverflow.com/questions/20389970/does-anybody-know-how-to-use-neon-intrinsics-uint8x8-t-vclt-s8-int8x8-t-int8x8) where they use an unsigned mask result. – Paul R Dec 09 '13 at 11:12
  • 1
    @BonderWu: Accept the answer if this helped you solve the problem – Anoop K. Prabhu Dec 11 '13 at 13:08
  • @Anoop K. Prabhu , OK, thanks. I take sick leave yesterday, sorry. – BonderWu Dec 12 '13 at 02:31