2

I'm working under Sun Studio 12.3 on SunOS 5.11 (Solaris 11.3). Its providing a compile error that I don't quite understand:

$ /opt/solarisstudio12.3/bin/CC -xarch=sse2 -xarch=aes -xarch=sse4_2 -c test.cxx 
"test.cxx", line 11: ube: error: _mm_aeskeygenassist_si128 intrinsic requires at least -xarch=aes.
CC: ube failed for test.cxx

Adding -m64 produces the same error.

There's not much to the test program. It simply exercises a SSE2 intrinsic, and an AES intrinsic:

$ cat test.cxx
#include <stdint.h>
#include <wmmintrin.h>
#include <emmintrin.h>
int main(int argc, char* argv[])
{
  // SSE2
  int64_t x[2];
  __m128i y = _mm_loadu_si128((__m128i*)x);

  // AES
  __m128i z = _mm_aeskeygenassist_si128(y,0);

  return 0;
}

I've been trying to work through the manual and learn how to specify multiple cpu architecture features, like SSE2, SSSE3, AES and SSE4. But I can't seem to determine how to specify multiple ones. Here's one of the more complete pages I have found: Oracle Man Page CC.1, but I'm obviously missing something with respect to -xarch.

What am I doing wrong, and how do I fix it?

jww
  • 97,681
  • 90
  • 411
  • 885
  • Isn't `aes` a superset of `sse2 U sse4_2`. If so you can provide just the -`xarch=aes` option. – Leon Jun 10 '16 at 11:30
  • @Leon - I don't believe so. We recently took a similar bug report on a non-Solaris system. I also have to work out how RDRAND, RDSEED and BMI2 fit into the scheme of things, so the problem is probably going to get worse before it gets better. – jww Jun 10 '16 at 11:37
  • @Leon - I was looking at the [Man Page cc.1](http://docs.oracle.com/cd/E24457_01/html/E22003/cc.1.html) (little c's) earlier. There's a natural progression for SSE, but AES is not included in the progression. – jww Jun 10 '16 at 11:50
  • Your link is to the Solaris Studio 12.2 man page. The 12.3 documentation [here](http://docs.oracle.com/cd/E24457_01/html/E21991/bkana.html#bkaza) shows an `-xarch=aes` option. – Andrew Henle Jun 10 '16 at 12:39
  • @Leon - I'm I hope I don't sound argumentative. AES is part of Intel's SecureKey. The SecureKey extensions or features may or may not be present independent of other extension or features. For AMD processors, RDRAND, which is also part of SecureKey, came 3 years later. On occasion, I run into these corner cases. – jww Jun 12 '16 at 09:28

2 Answers2

3

This command line

$ /opt/solarisstudio12.3/bin/CC -xarch=sse2 -xarch=aes -xarch=sse4_2 -c test.cxx 

will use the last of -xarch=sse2 -xarch=aes -xarch=sse4_2 and cause the compiler to emit sse4_2-compatible binaries.

This is documented in Chapter 3 of the C++ User's Guide:

3.2 General Guidelines

Some general guidelines for the C++ compiler options are:

  • The-llib option links with library liblib.a (or liblib.so). It is always safer to put-llib after the source and object files to ensure the order in which libraries are searched.

  • In general, processing of the compiler options is from left to right (with the exception that-U options are processed after all-D options), allowing selective overriding of macro options (options that include other options). This rule does not apply to linker options.

  • The -features, -I -l, -L, -library, -pti, -R, -staticlib, -U, -verbose, and -xprefetch options accumulate, they do not override.

  • The -D option accumulates. However, multiple -D options for the same name override each other.

Source files, object files, and libraries are compiled and linked in the order in which they appear on the command line.

This is done so you can do things like override the expansion of arguments like -fast, which expands to about 10 separate arguments.

You should use the -xarch=aes flag - either last or as the only -xarch=... option.

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
  • OK, I was looking for something like that based on experience with GNU's single pass LD. Thank you again. Man, I am so low on the Sun Studio learning curve... – jww Jun 10 '16 at 12:51
  • How do I handle RDRAND, RDSEED and BMI2? There can only be one "last one". For example, I have a MacBook Pro with AES but not RDRAND. I have a Asus with AES and RDRAND, but not RDSEED. I have a Qotom with AES, RDRAND, RDSEED and BMI2. – jww Jun 10 '16 at 12:54
0

I'm going to toss in an answer for those coming from GCC. In the GCC world, we do -march=native and GCC defines macros like -D__SSE2__, -D__SSE4_1__, -D__SSE4_2__, -D__AES__, -D__AVX__, -D__BMI__, etc.

SunCC does not do like GCC does. It does not provide defines like __SSE2__; nor does it provide the value for -xarch.

Here are the references to the relevant Sun Studio manuals and the -xarch options/instructions set choices:

Here's how we are determining what flags we can use, and then converting them to GCC preprocessor macros. Its awful, but I don't know how to get the code generated otherwise.

CC=...
EGREP=...

X86_CPU_FLAGS=$(isainfo -v 2>/dev/null)
SUNCC_510_OR_ABOVE=$("$CXX" -V 2>&1 | "$EGREP" -c "CC: (Sun|Studio) .* (5\.1[0-9]|5\.[2-9]|[6-9]\.)")
SUNCC_511_OR_ABOVE=$("$CXX" -V 2>&1 | "$EGREP" -c "CC: (Sun|Studio) .* (5\.1[1-9]|5\.[2-9]|[6-9]\.)")
SUNCC_512_OR_ABOVE=$("$CXX" -V 2>&1 | "$EGREP" -c "CC: (Sun|Studio) .* (5\.1[2-9]|5\.[2-9]|[6-9]\.)")
SUNCC_513_OR_ABOVE=$("$CXX" -V 2>&1 | "$EGREP" -c "CC: (Sun|Studio) .* (5\.1[3-9]|5\.[2-9]|[6-9]\.)")

SUNCC_XARCH=
if [[ ("$SUNCC_511_OR_ABOVE" -ne "0") ]]; then
    if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "sse2") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__SSE2__"); SUNCC_XARCH=sse2; fi
        if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "sse3") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__SSE3__"); SUNCC_XARCH=ssse3; fi
        if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "ssse3") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__SSSE3__"); SUNCC_XARCH=ssse3; fi
        if [[ ("$SUNCC_512_OR_ABOVE" -ne "0") ]]; then
            if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "sse4.1") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__SSE4_1__"); SUNCC_XARCH=ssse4_1; fi
            if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "sse4.2") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__SSE4_2__"); SUNCC_XARCH=ssse4_2; fi
            if [[ ("$SUNCC_513_OR_ABOVE" -ne "0") ]]; then
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "aes") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__AES__"); SUNCC_XARCH=aes; fi
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "pclmulqdq") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__PCLMUL__"); SUNCC_XARCH=aes; fi
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "rdrand") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__RDRND__"); SUNCC_XARCH=avx_i; fi
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "rdseed") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__RDSEED__"); SUNCC_XARCH=avx_i; fi
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "avx") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__AVX__"); SUNCC_XARCH=avx; fi
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "avx2") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__AVX2__"); SUNCC_XARCH=avx2; fi
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "bmi") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__BMI__"); SUNCC_XARCH=avx2; fi
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "bmi2") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__BMI2__"); SUNCC_XARCH=avx2; fi
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "adx") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__ADX__"); SUNCC_XARCH=avx2_i; fi        
            fi
        fi
    fi
fi
PLATFORM_CXXFLAGS+=("-xarch=$SUNCC_XARCH")

The gyrations above allow us to do things like this (except we need SSE2 though ADX).

#if (_MSC_VER >= 1700) || defined(__RDRND__)
    uint64_t val;
    if(_rdrand64_step(&val))
    {
        // Use RDRAND value
    }
#endif

Without the gyrations, we continually crash the 12.1 through 12.3 compilers during testing with the inline assembly and intrinsics.

The result of running the script gives us the recipe for CFLAGS and CXXFLAGS. Below is from a 4th gen Core i5. XEON's produce different results, as does a 5th gen Core i5. For example, a 5th gen Core i5 will have ADX and use -xarch=avx_i.

Pathname: /opt/solstudio12.2/bin/CC (symlinked)
CXXFLAGS: -D__SSE2__ -D__SSE3__ -D__SSSE3__ -xarch=ssse3

/opt/solarisstudio12.3/bin/CC (symlinked)
CXXFLAGS: -D__SSE2__ -D__SSE3__ -D__SSSE3__ -D__SSE4_1__ -D__SSE4_2__ -xarch=ssse4_2

Pathname: /opt/solarisstudio12.4/bin/CC
CXXFLAGS: -D__SSE2__ -D__SSE3__ -D__SSSE3__ -D__SSE4_1__ -D__SSE4_2__ -D__AES__ -D__PCLMUL__ -D__RDRND__ -D__AVX__ -xarch=avx

...
jww
  • 97,681
  • 90
  • 411
  • 885