Wrong result on modular arithmetic on ARM (Apple M1) with clang -O3 optimization

Question

I am pulling my hair out for the last couple of days with this "innocuous" piece of code (minimal reproducible example, part of a larger modular multiplication routine):

#include <iostream>
#include <limits>

using ubigint = unsigned long long int;
using bigint = long long int;

void modmul(bigint a, bigint b, ubigint p) {
    ubigint ua = a < 0 ? -a : a;
    ubigint ub = b < 0 ? -b : b;

    ua %= p;
    ub %= p;

    std::cout << "ua: " << ua << '\n';
}

int main() {
    bigint minbigint = std::numeric_limits<bigint>::min();
    bigint maxbigint = std::numeric_limits<bigint>::max();
    std::cout << "minbigint: " << minbigint << '\n';
    std::cout << "maxbigint:  " << maxbigint << '\n';

    modmul(minbigint, maxbigint, 2314); // expect ua: 2036, got ua: 0
}

I am compiling on macOS 11.4 with clang 12.0 installed from Homebrew

clang version 12.0.0 
Target: arm64-apple-darwin20.5.0 
Thread model:posix 
InstalledDir: /opt/homebrew/opt/llvm/bin

When compiling with clang -O1, the program spits out the expected result (in this case, 2036, I've checked with Wolfram Mathematica, Mod[9223372036854775808, 2314], and this is correct). However, when I compile with clang -O2 or clang -O3 (full optimization), somehow the variable ua is zeroed out (its value becomes 0). I am at a complete loss here, and have no idea why this happens. IMO, there's no UB, nor overflow, or anything dubious in this piece of code. I'd greatly appreciate any advice, or if you can reproduce the issue on your side.

PS: the code behaves as expected on any other platforms, including Windows/Linux/FreeBSD/Solaris), with any combination of compilers. I'm only getting this error on Apple M1 with clang 12 (didn't test with other compilers on M1).

Well you say no overflow, but negating the most negative bigint is a bit shady. Does this still occur if you cast to ubigint *before* negating? (negating unsigned integers is safe after all) — harold, Jul 14 '21 at 02:31
@harold Harold, you're golden! This seems to be the problem indeed. Can you perhaps post an answer? This is one of those very dark corners of C (inherited by C++) most likely. Frankly, I don't know if this is UB or not... I wonder if I'll ever totally understand unsigned vs signed in C++. — vsoftco, Jul 14 '21 at 02:33
@harold The funny thing is that even using `std::abs` gives a wrong answer, i.e., `ubigint ua = std::abs(a)` when `a==minbigint`. — vsoftco, Jul 14 '21 at 02:47
@vsoftco use `-fsanitize=undefined` and you'll see the error immediately https://godbolt.org/z/b49zx1dKa. You can also use a static analyzer in this case — phuclv, Jul 14 '21 at 02:54
@phuclv Thanks! I am actually using it, `clang++ -fsanitize=undefined -O3 -Wall -Wextra test.cpp`, and unfortunately I'm not getting any warning on my plarform`. — vsoftco, Jul 14 '21 at 03:04
@phuclv However, it shows the issues on other platforms, so super useful! — vsoftco, Jul 14 '21 at 03:11
@phuclv I guess so, but at least now I get warnings on all other CI systems. Tricky stuff... — vsoftco, Jul 14 '21 at 03:12
I see it also with clang++ 12 on Ubuntu arm64, though not with clang++ 11. What's interesting is that -O2 inlines and constant-folds `modmul`, so that `main` just ends up with `std::cout << "ua: " << 0 << '\n';` and `modmul` itself isn't actually called at runtime. Usually you expect to get *more* intuitive behavior for things like overflow when computation is done at compile time, but in this case not so much. I do get an error from UBSan though. — Nate Eldredge, Jul 14 '21 at 05:58

mibu · Accepted Answer · 2021-07-14T16:21:44.223

UPDATE: As @harold pointed out in the comment section, negq and subq from 0 is exactly the same. So the my discussion related to negq and subq below is incorrect. Please disregard that part, sorry for not double checking before posting answer.

About the original question, I recompile a slightly simpler version of the code godbolt and find out that the problematic compiler's optimization is in main not modmul. In main, clang see that all of its operands for modmul is constant so it decided to do the computation of modmul at compile time. When calculating ubigint ua = a < 0 ? -a : a;, clang find out that is signed integer overflow UB so it decided to return 0 and print out. That may seem to be a radical thing to do but it's legal because of UB. Moreover, since there is no mathematically correct answer due to the limitation of two's compliment system, return 0 is arguably as good (or as bad) as any other result.

OLD ANSWER BELOWS

As some one pointed out in the comment section, the 2 lines below in your code is undefined behavior - signed integer overflow UB.

    ubigint ua = a < 0 ? -a : a;
    ubigint ub = b < 0 ? -b : b;

If you wonder what exactly clang does under the hood to produce 2 different results at 2 different optimization levels, consider a simple example as following.

using ubigint = unsigned long long int;
using bigint = long long int;

ubigint
negate(bigint a)
{
    ubigint ua = -a;
    return ua;
}

When compile with -O0

negate(long long):                             # @negate(long long)
        pushq   %rbp
        movq    %rsp, %rbp
        movq    %rdi, -8(%rbp)
        xorl    %eax, %eax
        subq    -8(%rbp), %rax  # Negation is performed here
        movq    %rax, -16(%rbp)
        movq    -16(%rbp), %rax
        popq    %rbp
        retq

Compile with -O3

negate(long long):                             # @negate(long long)
        movq    %rdi, %rax
        negq    %rax  # Negation is performed here
        retq

At -O0, clang use normal subq instruction which perform binary subtraction of 0 and %rax and produce results with integer-wrap-around behavior.

At -O3, clang can do better, it use negq instruction which only replace the operand with its two's complement (i.e flip all the bits and add 1). However, you can see that this optimization is only legal if signed integer overflow is undefined behavior (hence the compiler can just ignore overflow cases). If the standard required integer-wrap-around behavior, clang must fall back to the unoptimized version.

That makes no sense to me. Subtracting from 0 and `neg` are *exactly* the same thing, not something that's only conditionally valid. `neg` is fine in the case of the most negative integer too. — harold, Jul 14 '21 at 14:16
@harold After double checking my answer, I think you are right. My discussion related to `negq` and `subq` above is incorrect. About the original question, I tried recompile a simpler version of the code [godbolt](https://godbolt.org/z/38Gz1dobK) and find out that the problematic optimization is not in `modmul` function but in `main`. Here, clang do constant propagation, find UB and decide not to call `modmult` at all. I think this is the reason why clang at -O3 produce 0. What do you think? — mibu, Jul 14 '21 at 15:32
That's probably it, because pretty much the only way that the negation can "go wrong" is if it does not happen at runtime (any reasonable way to implement it would also work for the most negative integer) — harold, Jul 14 '21 at 15:38

Wrong result on modular arithmetic on ARM (Apple M1) with clang -O3 optimization

1 Answers1