I would like to know is it possible to with neon vectors to downsample an image by 3 ? I'm trying to write an algorithm for that on paper, but it seems it is not possible. Because when you get for example 8 bytes, you can not get 3*3pixels, there won't be enough pixels to complete the downsampling operation. According to the downsample by 2: Explaining ARM Neon Image Sampling I think about loading 16bytes, then 8bytes from one row, then assign them to a 32bytes vector, then process it 24 bytes of that vector?
Update: I have written a sample code according to the answer, but I get a segmentation fault in the vst1_u8...
inline void downsample3dOnePass( uint8_t* src, uint8_t *dst, int srcWidth)
{
// make sure rows/cols dividable by 8
int rows = ((srcWidth>>3)<<3);
// 8 pixels per row
rows=rows>>3;
for (int r = 0; r < rows; r++)
{
// load 24 pixels (grayscale)
uint8x8x3_t pixels = vld3_u8(src);
// first sum = d0 + d1
uint8x8_t firstSum = vadd_u8 ( pixels.val[0], pixels.val[1] );
// second sum = d1+d2;
uint8x8_t secondSum = vadd_u8 ( firstSum, pixels.val[2] );
// total sum = d0+d1+d2
uint8x8_t totalSum = vadd_u8(secondSum, firstSum);
// average = d0+d1+d2/8 ~9 for test
uint8x8_t totalAverage = vshr_n_u8(totalSum,3);
// store 8 bytes
vst1_u8(dst, totalAverage);
// move to next 3 rows
src+=24;
// move to next row
dst+=8;
}
}