I am doing some image processing, for which I benefit from vectorization.
I have a function that vectorizes ok, but for which I am not able to convince the compiler that the input and output buffer have no overlap, and so no alias checking is necessary.
I should be able to do so using __restrict__
, but if the buffers are not defined as __restrict__
when arriving as function argument, there is no way to convince the compiler that I am absolutely sure that 2 buffers will never overlap.
This is the function:
__attribute__((optimize("tree-vectorize","tree-vectorizer-verbose=6")))
void threshold(const cv::Mat& inputRoi, cv::Mat& outputRoi, const unsigned char th) {
const int height = inputRoi.rows;
const int width = inputRoi.cols;
for (int j = 0; j < height; j++) {
const uint8_t* __restrict in = (const uint8_t* __restrict) inputRoi.ptr(j);
uint8_t* __restrict out = (uint8_t* __restrict) outputRoi.ptr(j);
for (int i = 0; i < width; i++) {
out[i] = (in[i] < valueTh) ? 255 : 0;
}
}
}
The only way I can convince the compiler to not perform the alias checking is if I put the inner loop in a separate function, in which the pointers are defined as __restrict__
arguments. If I declare this inner function as inlined, again the alias checking is activated.
You can see the effect also with this example, which I think is consistent: http://goo.gl/7HK5p7
(Note: I know there might be better ways of writing the same function, but in this case I am just trying to understand how to avoid alias check)
Edit:
Problem is solved!! (See answer below)
Using gcc 4.9.2, here is the complete example. Note the use of the compiler flag -fopt-info-vec-optimized
in place of the superseded -ftree-vectorizer-verbose=N
.
So, for gcc, use #pragma GCC ivdep
and enjoy! :)