2

I've been using the excellent godbolt.org to determine what gcc does and doesn't vectorize: but I can't work out any way of getting it to vectorize a min(X,Y) function into a PMINUQ etc.

Looking at the sse.md machine description language file in the gcc source, I can see a block around lines 12355 onwards that mentions p<maxmin_int><ssemodesuffix>, which looks to me as though it ought to output PMINUQ etc. So I can't see any reason why compiling for this pattern with -msse4 -msse4.1 shouldn't just work.

However, this part of the md also has a "&& " line inside it, which seems (?) to imply that this opcode only works on AVX-style wide targets.

So, I can't tell whether this is a hardware limitation, a compiler/md bug, a godbolt.org problem with -msse4.1, or something else entirely. Can anyone help me narrow this down a bit?

gcc -msse4 -msse4.1 -msse4.2 -O3 -fopt-info-vec-all

#include <stdint.h>

#define MAX_LOOPS 10000

uint64_t in_array[MAX_LOOPS];
uint64_t shift_array[MAX_LOOPS];

void do_max(uint64_t maxval)
{
    for (int i=0; i<MAX_LOOPS; i++)
        out_array[i] = (in_array[i] < maxval) ? in_array[i] : maxval;
}

godbolt.org tells me I'm getting...

    pcmpeqq xmm0, xmm1
    pandn   xmm0, xmm2

...when I'm hoping for...

    pminuq  xmm0, xmm1
nickpelling
  • 119
  • 9

1 Answers1

2

vpminuq requires AVX512. (https://www.felixcloutier.com/x86/pminud:pminuq)

SSE4.1 / AVX2 only has pminub/w/d. Try using arrays with 32-bit elements.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thanks very much indeed, Peter! I've been staring at that all evening, and it was on Felix Cloutier's site right in front of me all the time. You're a star! :-) – nickpelling Sep 23 '19 at 23:15
  • @nickpelling: AVX512 has cluttered the ISA manuals significantly; for people that aren't interested in AVX512 it's a lot of noise to wade through. It was easier to get a handle on what was AVX/AVX2 vs. SSE4 vs. SSE2 before those were added. I'd maybe suggest grabbing an older PDF of Intel's vol.2 manual from 2015 or so, while AVX512 was still in the "future extensions" manual. Or use their online intrinsics guide which lets you filter by tech and easily exclude AVX512; it's searchable by asm mnemonic. But it's not as good if you're looking for asm docs. – Peter Cordes Sep 24 '19 at 00:02