2

I have the following 4x4 matrix-vector multiply code:

double const __restrict__ a[16];
double const __restrict__ x[4];
double       __restrict__ y[4];

//#pragma GCC unroll 1 - does not work either
#pragma GCC nounroll
for ( int j = 0; j < 4; ++j )
{
    double const* __restrict__ aj = a + j * 4;
    double const xj = x[j];

    #pragma GCC ivdep
    for ( int i = 0; i < 4; ++i )
    {
        y[i] += aj[i] * xj;
    }
}

I compile with -O3 -mavx flags. The inner loop is vectorized (single FMAD). However, gcc (7.2) keeps unrolling the outer loop 4 times, unless I use -O2 or lower optimization.

Is there a way to override -O3 unrolling of a particular loop?

NB. Similar #pragma nounroll works if I use Intel icc.

user2052436
  • 4,321
  • 1
  • 25
  • 46
  • `#pragma GCC unroll n` works in GCC 8.x - but only to some extent - as far as I can see it is more a hint than strict requirement (how many times it should be unrolled). – Anty Sep 21 '18 at 22:31

2 Answers2

4

According to the documentation, #pragma GCC unroll 1 is supposed to work, if you place it just so. If it doesn't then you should submit a bug report.

Alternatively, you can use a function attribute to set optimizations, I think:

void myfn () __attribute__((optimize("no-unroll-loops")));
ams
  • 24,923
  • 4
  • 54
  • 75
1

For concise functions sans full and partial loop unrolling when required the following function attribute please try.

__attribute__((optimize("Os")))
kyle
  • 11
  • 1