3

I am trying out the Auto-Vectorizer mode of Visual Studio 2013 on x86_64, and I am a bit surprised with the following. Consider the naive code:

static void rescale( double * __restrict out, const int * __restrict in, long n, const double intercept, const double slope )
{
    for( long i = 0; i < n; ++i )
        out[i] = slope * in[i] + intercept;
}

Visual Studio returns that it is failing on such naive example with:

--- Analyzing function: rescale
c:\users\malat\autovec\vec.c(13) : info C5012: loop not parallelized due to reason '1008'

Where compilation line is (I am only interested in SSE2 for now):

cl vec.c /O2 /Qpar /Qpar-report:2

Looking at the documentation:

Leads to:

Which reads as:

The compiler detected that this loop does not perform enough work to warrant auto-parallelization.

Is there a way to rewrite this loop so that the Auto-Vectorizer mode is triggered properly ?

I failed to rewrite the code using a simple approach:

static void rescale( double * __restrict out, const double * __restrict in, long n, const double intercept, const double slope )
{
    for( long i = 0; i < n; ++i )
        out[i] = slope * in[i] + intercept;
}

In the above case Visual Studio still reports:

--- Analyzing function: rescale
c:\users\malat\autovec\vec.c(13) : info C5012: loop not parallelized due to reason '1008'

How should I rewrite my initial code to please the Auto-Vectorizer mode of Visual Studio 2013 ? I would like to be doing a * b + c with vectors of 64-bit double : SSE2

malat
  • 12,152
  • 13
  • 89
  • 158

2 Answers2

3

The sample code near the bottom of the MSDN link you posted suggests using the hint_parallel pragma:

void code_1008()
{
    // Code 1008 is emitted when the compiler detects that
    // this loop does not perform enough work to warrant 
    // auto-parallelization.

    // You can resolve this by specifying the hint_parallel
    // pragma. CAUTION -- if the loop does not perform
    // enough work, parallelizing might cause a potentially 
    // large performance penalty.

    // #pragma loop(hint_parallel(0)) //  hint_parallel will force this through
    for (int i=0; i<1000; ++i)
    {
        A[i] = A[i] + 1;
    }
}
Sean
  • 60,939
  • 11
  • 97
  • 136
1

The second link to MSDN you gave contains samples how to force compiler to vectorize the loop.

// You can resolve this by specifying the hint_parallel
// pragma. CAUTION -- if the loop does not perform
// enough work, parallelizing might cause a potentially 
// large performance penalty.

// #pragma loop(hint_parallel(0)) //  hint_parallel will force this through
for (int i=0; i<1000; ++i)
{
    A[i] = A[i] + 1;
}