0

I have a buffer of 12-bit data (stored in 16-bit data) and need to converts into 8-bit (shift by 4)

How can the NEON accelerate this processing ?

Thank you for your help

Brahim

bhamadicharef
  • 360
  • 1
  • 11

2 Answers2

3

Took the liberty to assume a few things explained below, but this kind of code (untested, may require a few modifications) should provide a good speedup compared to naive non-NEON version:

#include <arm_neon.h>
#include <stdint.h>

void convert(const restrict *uint16_t input, // the buffer to convert
             restrict *uint8_t output,       // the buffer in which to store result
             int sz) {                       // their (common) size

  /* Assuming the buffer size is a multiple of 8 */
  for (int i = 0; i < sz; i += 8) {
    // Load a vector of 8 16-bit values:
    uint16x8_t v = vld1q_u16(buf+i);
    // Shift it by 4 to the right, narrowing it to 8 bit values.
    uint8x8_t shifted = vshrn_n_u16(v, 4);
    // Store it in output buffer
    vst1_u8(output+i, shifted);
  }

}

Things I assumed here:

  • that you're working with unsigned values. If it's not the case, it will be easy to adapt anyway (uint* -> int*, *_u8->*_s8 and *_u16->*_s16)
  • as the values are loaded 8 by 8, I assumed the buffer length was a multiple of 8 to avoid edge cases. If that's not the case, you should probably pad it artificially to a multiple of 8.

Finally, the 2 resource pages used from the NEON documentation:

Hope this helps!

mbrenon
  • 4,851
  • 23
  • 25
  • 1
    you don't need the q flag to be set unless you also set the r flag. nothing to saturate here. – Jake 'Alquimista' LEE Sep 11 '13 at 16:03
  • True, that's a bad copy-paste from the NEON documentation! Fixed it. – mbrenon Sep 11 '13 at 17:29
  • Thank you. I will do some tests. I am trying to include such code into a Qt 4.8 application, and need to remove the "const restrict". The buf+i was changed to input+i. Data is 12-bit unsigned into 16-bit and now shifted into 8-bit unsigned. – bhamadicharef Sep 13 '13 at 09:51
1
prototype : void dataConvert(void * pDst, void * pSrc, unsigned int count);
    1:
    vld1.16 {q8-q9}, [r1]!
    vld1.16 {q10-q11}, [r1]!
    vqrshrn.u16 d16, q8, #4
    vqrshrn.u16 d17, q9, #4
    vqrshrn.u16 d18, q10, #4
    vqrshrn.u16 d19, q11, #4
    vst1.16 {q8-q9}, [r0]!
    subs r2, #32
    bgt 1b

q flag : saturation

r flag : rounding

change u16 to s16 in case of signed data.

Jake 'Alquimista' LEE
  • 6,197
  • 2
  • 17
  • 25