I am trying out the Auto-Vectorizer mode of Visual Studio 2013 on x86_64, and I am a bit surprised with the following. Consider the naive code:
static void rescale( double * __restrict out, const int * __restrict in, long n, const double intercept, const double slope )
{
for( long i = 0; i < n; ++i )
out[i] = slope * in[i] + intercept;
}
Visual Studio returns that it is failing on such naive example with:
--- Analyzing function: rescale
c:\users\malat\autovec\vec.c(13) : info C5012: loop not parallelized due to reason '1008'
Where compilation line is (I am only interested in SSE2 for now):
cl vec.c /O2 /Qpar /Qpar-report:2
Looking at the documentation:
Leads to:
Which reads as:
The compiler detected that this loop does not perform enough work to warrant auto-parallelization.
Is there a way to rewrite this loop so that the Auto-Vectorizer mode is triggered properly ?
I failed to rewrite the code using a simple approach:
static void rescale( double * __restrict out, const double * __restrict in, long n, const double intercept, const double slope )
{
for( long i = 0; i < n; ++i )
out[i] = slope * in[i] + intercept;
}
In the above case Visual Studio still reports:
--- Analyzing function: rescale
c:\users\malat\autovec\vec.c(13) : info C5012: loop not parallelized due to reason '1008'
How should I rewrite my initial code to please the Auto-Vectorizer mode of Visual Studio 2013 ? I would like to be doing a * b + c
with vectors of 64-bit double : SSE2