4

I have to make the following AVX operations:

__m256 perm, func;
__m256 in = _mm256_load_ps(inPtr+x);
__m256 acc = _mm256_setzero_ps();

perm = _mm256_shuffle_ps(in, in, _MM_SHUFFLE(3,2,1,0));
func = _mm256_load_ps(fPtr+0);
acc = _mm256_add_ps(acc, _mm256_mul_ps(perm, func));

perm = _mm256_shuffle_ps(in, in, _MM_SHUFFLE(2,3,0,1));
func = _mm256_load_ps(fPtr+1);
acc = _mm256_add_ps(acc, _mm256_mul_ps(perm, func));

perm = _mm256_shuffle_ps(in, in, _MM_SHUFFLE(1,0,3,2));
func = _mm256_load_ps(fPtr+2);
acc = _mm256_add_ps(acc, _mm256_mul_ps(perm, func));

perm = _mm256_shuffle_ps(in, in, _MM_SHUFFLE(0,1,2,3));
func = _mm256_load_ps(fPtr+3);
acc = _mm256_add_ps(acc, _mm256_mul_ps(perm, func));

This could be rewritten like this:

__m256 perm, func;
__m256 in = _mm256_load_ps(inPtr+x);
__m256 acc = _mm256_setzero_ps();
for(int i=0;i<4;++i)
{
    perm = _mm256_shuffle_ps(in, in, _MM_SHUFFLE(3^i,2^i,1^i,0^i));
    func = _mm256_load_ps(fPtr+i);
    acc = _mm256_add_ps(acc, _mm256_mul_ps(perm, func));
}

This compiles in gcc 4.9.1, despite _mm256_shuffle_ps only accepting immediate integer values as third parameter. This means, that i is accepted as an immediate, and thus means that the loop has been unrolled.

So I am curious : is this something guaranteed by the compiler, or could this cause compile errors when the opimization flags are modified, or when the gcc version changes? What about using other compilers (msvc, icc, clang...)

galinette
  • 8,896
  • 2
  • 36
  • 87
  • Have you tried `-O0`? What happens? – mindriot Mar 03 '16 at 09:17
  • 1
    Also, how do you arrive at the conclusion that only an immediate is supported? The [GCC implementation](https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/i386/avxintrin.h;hb=HEAD#l331) seems to take a `const int`, and [the Intel spec](https://software.intel.com/en-us/node/583079) appears to state the same. – mindriot Mar 03 '16 at 09:19
  • the `const int` signature does not tell anything, since there is nothing in the language that specifies if a parameter is an immediate. Furthermore, any `imm` parameter in intel intrinsics documentation means an immediate. This is clearer in the MSDN documentation. And finally, using a non immediate value in gcc triggers the error : "The last argument must be a 8-bit immediate" – galinette Mar 03 '16 at 12:36

1 Answers1

1

The intrinsic does require an immediate value. The compilation works only because it was optimized as a constant by unrolling the loop, and compiling with -O0 does trigger the following error:

(...)\lib\gcc\x86_64-w64-mingw32\4.9.2\include\avxintrin.h:331: error: the last argument must be an 8-bit immediate

      __mask);
            ^

A similar case was reported with icc here:

https://software.intel.com/en-us/forums/intel-c-compiler/topic/287217

galinette
  • 8,896
  • 2
  • 36
  • 87