2

It appears gcc will happily auto-vectorize simple examples, and emit SSE instructions. Is there any way to emit MMX instructions only?

For example if I try the following example on Godbolt:

int sumint(int *arr) {
    int sum = 0;
    for (int i=0 ; i<2048 ; i++){
        sum += arr[i];
    }
    return sum;
}

compiling on GCC 9.2 with -mmmx -O3 -m32 -msse2, I get

sumint:
        mov     eax, DWORD PTR [esp+4]
        pxor    xmm0, xmm0
        lea     edx, [eax+8192]
.L2:
        movdqu  xmm2, XMMWORD PTR [eax]
        add     eax, 16
        paddd   xmm0, xmm2
        cmp     edx, eax
        jne     .L2
        movdqa  xmm1, xmm0
        psrldq  xmm1, 8
        paddd   xmm0, xmm1
        movdqa  xmm1, xmm0
        psrldq  xmm1, 4
        paddd   xmm0, xmm1
        movd    eax, xmm0
        ret

But without sse (i.e. -mmmx -O3 -m32 -mno-sse2), it falls back to only using general registers, and no mmx instructions:

sumint:
        mov     eax, DWORD PTR [esp+4]
        xor     edx, edx
        lea     ecx, [eax+8192]
.L2:
        add     edx, DWORD PTR [eax]
        add     eax, 4
        cmp     eax, ecx
        jne     .L2
        mov     eax, edx
        ret

I wanted to run some Benchmarks, comparing the effect of running with just x87-fpu, MMX, SSE and SSE2, but if gcc won't emit MMX instructions, then there won't be any difference between compiling for x87 and x87+mmx.

Ant6n
  • 1,887
  • 1
  • 20
  • 26
  • 2
    IDK if older versions of GCC ever knew how to auto-vectorize for MMX. Maybe try the oldest GCC on Godbolt, although that's probably still not going to work. – Peter Cordes Sep 29 '19 at 02:08
  • Adding the flag `-fopt-info-vec-missed` gives out: `missed: not vectorized: relevant stmt not supported: sum_10 = _4 + sum_15;` so probably MMX-autovectorization is just not implemented – chtz Sep 29 '19 at 14:43

1 Answers1

1

GCC can't autovectorize using MMX or 3DNow! because it lacks the ability to properly insert EMMS/FEMMS. You have to use ICC for MMX. See https://gcc.gnu.org/ml/gcc-patches/2004-12/msg01955.html

Zuxy
  • 68
  • 5
  • `vzeroupper` after using YMM registers (in AVX instructions) is the same kind of problem which is solved by now. If anyone cared about obsolete MMX, it might be not be a *huge* amount of work to modify GCC to insert EMMS the same way, allowing MMX auto-vec to be re-enabled (if it hasn't been pruned out or made useless by bit-rot after being disabled for 15 years). Unlikely to be worth it anyone's time in real life, just an observation. And thanks for posting this; I have sometimes wondered why modern GCC doesn't seem able to auto-vec with MMX even with `-march=pmmx` – Peter Cordes Jan 12 '20 at 06:56