2

See this simple minimum search (Godbolt):

float foo(const float *data, int n) {
    float v = data[0];
    for (int i = 1; i < n; i++) {
        float d = data[i];
        if (d < v) {
            v = d;
        }
    }
    return v;
}

Neither gcc nor clang auto-vectorizes this code with -O3. If I use -ffinite-math-only, still no auto-vectorization happen. I need to use -ffinite-math-only and -fno-signed-zeros and the compiler auto-vectorizes the code. Why is -fno-signed-zeros needed for auto-vectorization to kick in?

Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
geza
  • 28,403
  • 6
  • 61
  • 135
  • 2
    The two versions give different answers if the input is `{-0.0f, 0.0f, 0.0f, ... }`. You might trace through the vectorized code and see why it outputs positive 0 for this input. – Nate Eldredge Dec 06 '21 at 16:57
  • @NateEldredge: makes sense, thanks. Basically `foo` returns the sign of the first zero in this case. But, if we process 4 float interleaved sub-streams simultaneously, then we need to manage which sub-stream had the first zero. Which presumably makes vectorization slower, and it doesn't worth it too much. – geza Dec 06 '21 at 19:18
  • 2
    Indeed - or at least the compiler doesn't know how to do it. The fundamental issue is that vectorizing an "accumulate" operation is only really feasible if that operation is commutative, and minimum is not commutative in the presence of signed zeros. – Nate Eldredge Dec 06 '21 at 22:09

0 Answers0