I am trying to accelerate a stereo matching algorithm on ODROID XU4 ARM platform using Neon SIMD. For this puropose I am using openMp's pragmas.
void StereoMatch:: sadCol(uint8_t* leftRank,uint8_t* rightRank,const int SAD_WIDTH,const int SAD_WIDTH_STEP, const int imgWidth,int j, int d , uint16_t* cost)
{
uint16_t sum = 0;
int n = 0;
int m =0;
for ( n = 0; n < SAD_WIDTH+1; n++)
{
#pragma omp simd
for( m = 0; m< SAD_WIDTH_STEP; m = m + imgWidth )
{
sum += abs(leftRank[j+m+n]-rightRank[j+m+n-d]);
};
cost[n] = sum;
sum = 0;
};
I am fairly new to SIMD and openMp, I understood that using the SIMD pragma in the code will direct the compiler to vectorize the subtraction, but when I executed the code I noticed no difference. What should I add to my code in order to vectorize it ?