1

I'm testing on Solaris 11.3 with Sun Studio 12.2. I'm having trouble getting a SSE shuffle to compile:

solaris:$ cat test.cxx
#include <stdint.h>
#include <emmintrin.h>

int main(int argc, char* argv[])
{
    __m128i a, b;
    asm ("pshufb %1, %0" : "+x"(a) : "xm"(b));
    return 0;
}

And then:

solaris:$ /opt/solstudio12.2/bin/CC test.cxx -o test.exe
"test.cxx", line 7: Error: The operand type "__m128i_" is not allowed for the constraint "+x".
1 Error(s) detected.

pshufb is a SSSE3 instruction, but I'm having trouble determining if Sun Studio 12.2 supports it (too much irrelevant noise during search). I believe Sun Studio supports it. Sun Studio 12.3 and above consumes the inline assembly.

Why am I getting the error, and how do I fix it?

jww
  • 97,681
  • 90
  • 411
  • 885
  • 1
    Why not just use the correct intrinsic ([`_mm_shuffle_epi8`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_shuffle_epi8&expand=4723)) ? It will not only be more efficient but also more portable between different compilers. – Paul R Apr 24 '17 at 08:21
  • @Paul - *"Why not just use the correct intrinsic..."* - there's no way for us to tell when we are allowed to use the intrinsic. Sun Compiler do not signal like Clang and GCC. Macros like `__SSE2__` and `__SSSE3__` are never defined. In the absence of signalling, we use inline ASM instead. ASM is always available. Also see [Detect -xarch option in the preprocessor?](http://stackoverflow.com/q/38318425/608639) – jww Apr 24 '17 at 08:26
  • Surely if you're targetting hardware that supports SSSE3, then you're presumably compiling with `-xarch=ssse3` (or greater), in which case you should be able to use intrinsics from ``, no ? – Paul R Apr 24 '17 at 08:50
  • @Paul - Users may not use an `-xarch=XXX` option. That requires users to RTFM. If that strategy was going to work, then it would have happened in the last 40 or 50 years or so. We've found its better to engineer around users and get into a "it just works" state. When things just work, users get SSE2, SSSE3, SSE4, AES, CLMUL, RDRAND, RDSEED, AVX{2}, BMI{2} and SHA (with appropriate runtime guards) without doing anything. – jww Apr 24 '17 at 08:59
  • 1
    OK - so the problem is that you want to have multiple code paths in your source, and you don't have any preprocessor macros or other compile-time mechanisms for testing what CPU you are going to be running on ? So you have to test for CPU capabilities at run-time and dispatch accordingly ? – Paul R Apr 24 '17 at 09:32

0 Answers0