Is there a C snippet that computes overflow-safe addition efficiently without using compiler builtins?

Question

Here is a C function that adds an int to another, failing if overflow would happen:

int safe_add(int *value, int delta) {
        if (*value >= 0) {
                if (delta > INT_MAX - *value) {
                        return -1;
                }
        } else {
                if (delta < INT_MIN - *value) {
                        return -1;
                }
        }

        *value += delta;
        return 0;
}

Unfortunately it is not optimized well by GCC or Clang:

safe_add(int*, int):
        movl    (%rdi), %eax
        testl   %eax, %eax
        js      .L2
        movl    $2147483647, %edx
        subl    %eax, %edx
        cmpl    %esi, %edx
        jl      .L6
.L4:
        addl    %esi, %eax
        movl    %eax, (%rdi)
        xorl    %eax, %eax
        ret
.L2:
        movl    $-2147483648, %edx
        subl    %eax, %edx
        cmpl    %esi, %edx
        jle     .L4
.L6:
        movl    $-1, %eax
        ret

This version with __builtin_add_overflow()

int safe_add(int *value, int delta) {
        int result;
        if (__builtin_add_overflow(*value, delta, &result)) {
                return -1;
        } else {
                *value = result;
                return 0;
        }
}

is optimized better:

safe_add(int*, int):
        xorl    %eax, %eax
        addl    (%rdi), %esi
        seto    %al
        jo      .L5
        movl    %esi, (%rdi)
        ret
.L5:
        movl    $-1, %eax
        ret

but I'm curious if there's a way without using builtins that will get pattern-matched by GCC or Clang.

I see there is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48580 in the context of multiplication. But addition should be much easier to pattern-match. I'll report it. — Tavian Barnes, Feb 14 '20 at 15:47
Reported: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93742 — Tavian Barnes, Feb 14 '20 at 16:10

Jens Gustedt · Answer 1 · 2020-02-14T16:34:22.073

5

Tthe best one I came up with, if you don't have access to the overflow flag of the architecture, is to do things in unsigned. Just think of all bit arithmetic here in that we are only interested in the highest bit, which is the sign bit when interpreted as signed values.

(All that modulo sign errors, I didn't check this thouroughly, but I hope the idea is clear)

#include <stdbool.h>

bool overadd(int a[static 1], int b) {
  unsigned A = a[0];
  unsigned B = b;
  // This computation will be done anyhow
  unsigned AB = A + B;
  // See if the sign bits are equal
  unsigned AeB = ~(A^B);
  unsigned AuAB = (A^AB);
  // The function result according to these should be:
  //
  // AeB \ AuAB | false | true
  //------------+-------+------
  // false      | false | false
  // true       | false | true
  //
  // So the expression to compute from the sign bits is (AeB & AuAB)

  // This is INT_MAX
  unsigned M = -1U/2;
  bool ret = (AeB & AuAB) > M;

  if (!ret) a[0] += b;
  return ret;
}

If you find a version of the addition that is free of UB, such as an atomic one, the assembler is even without branch (but with a lock prefix)

#include <stdbool.h>
#include <stdatomic.h>
bool overadd(_Atomic(int) a[static 1], int b) {
  unsigned A = a[0];
  atomic_fetch_add_explicit(a, b, memory_order_relaxed);
  unsigned B = b;
  // This computation will be done anyhow
  unsigned AB = A + B;
  // See if the sign bits are equal
  unsigned AeB = ~(A^B);
  unsigned AuAB = (A^AB);
  // The function result according to these should be:
  //
  // AeB \ AuAB | false | true
  //------------+-------+------
  // false      | false | false
  // true       | false | true
  //
  // So the expression to compute from the sign bits is (AeB & AuAB)

  // This is INT_MAX
  unsigned M = -1U/2;
  bool ret = (AeB & AuAB) > M;
  return ret;
}

So if we had such an operation, but even more "relaxed" this could improve the situation even further.

Take3: If we use a special "cast" from the unsigned result to the signed one, this now is branch free:

#include <stdbool.h>
#include <stdatomic.h>

bool overadd(int a[static 1], int b) {
  unsigned A = a[0];
  //atomic_fetch_add_explicit(a, b, memory_order_relaxed);
  unsigned B = b;
  // This computation will be done anyhow
  unsigned AB = A + B;
  // See if the sign bits are equal
  unsigned AeB = ~(A^B);
  unsigned AuAB = (A^AB);
  // The function result according to these should be:
  //
  // AeB \ AuAB | false | true
  //------------+-------+------
  // false      | false | false
  // true       | false | true
  //
  // So the expression to compute from the sign bits is (AeB & AuAB)

  // This is INT_MAX
  unsigned M = -1U/2;
  unsigned res = (AeB & AuAB);
  signed N = M-1;
  N = -N - 1;
  a[0] =  ((AB > M) ? -(int)(-AB) : ((AB != M) ? (int)AB : N));
  return res > M;
}

edited Feb 14 '20 at 16:34

answered Feb 14 '20 at 08:40

Jens Gustedt

76,821
6
102
177

2

Not the DV, but I believe that the second XOR shouldn't be negated. See e.g. [this attempt](https://wandbox.org/permlink/1zjGgGYaIndQsUYh) to test all the proposals. – Bob__ Feb 14 '20 at 10:01
I tried something like this but couldn't get it to work. Looks promising but I wish GCC optimized the idiomatic code. – R.. GitHub STOP HELPING ICE Feb 14 '20 at 12:39
@Bob__, yes you are right, I am particularly bad in keeping the signs correct. – Jens Gustedt Feb 14 '20 at 14:08
@R..GitHubSTOPHELPINGICE That is probably quite difficult to achieve in the current ecosystem. There are people that defend the "undefinedness" of overflow with tooth and nails. – Jens Gustedt Feb 14 '20 at 14:14
Nice solve. +1 If the operands'signs are different, it can't overflow and otherwise there's overflow exactly if the result changed sign. I think it assumes two's complement, though, no? If so, I believe the nonatomic version is easily made branchless by replacing `if (!ret) a[0] += b;` with an unconditional assignment of the result of the unsigned addition converted to `int` (`a[0] = (int)AB;`). – Petr Skocik Feb 14 '20 at 14:37
1

@PSkocik, no this does not depend on the sign representation, the computation is entirely done as `unsigned`. But it depends on the fact that the unsigned type has not just the sign bit masked out. (Both are now guaranteed in C2x, that is, hold for all archs that we could find). Then, you can't cast the `unsigned` result back if it is greater than `INT_MAX`, that would be implementation defined and may raise a signal. – Jens Gustedt Feb 14 '20 at 14:44
OK, thanks. I thought two's complement and an always-defined signal-free unsigned-to-int conversion were a package deal. :/ – Petr Skocik Feb 14 '20 at 14:51
1

@PSkocik, no unfortunately not, that seemed to revolutianary to the committee. But here is a "Take3" that actual comes out without branches on my machine. – Jens Gustedt Feb 14 '20 at 15:15
1

Sorry to bother you again, but I think you should change Take3 into something like [this](https://godbolt.org/z/qqgs_K) to obtain correct results. It seems [promising](http://quick-bench.com/_oux0YwXptT_YRShRvAL_cUjdWI), though. – Bob__ Feb 14 '20 at 15:46
I see http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2330.pdf. I'd rather have a revolution, but Take3 is nice too. (Pedantically speaking, though, I believe you also need to special-case for `res==(unsigned)INT_MIN` or else with that you're in the `-(int)-res` "branch" and you trigger both the potentially signal-raising implementation-defined conversion and then actual UB. gcc and clang seem to optimize it just as nice, though: https://gcc.godbolt.org/z/zpTrNX) – Petr Skocik Feb 14 '20 at 15:46
@Bob__ Take 3 is about unconditionally assigning the computed value after proper no-op adjustments (that look like, but after optimization are not, a branch) to avoid the possibility of an implementation defined signal. You've re-added the branch (shows up as a condtional move (cmovns). – Petr Skocik Feb 14 '20 at 15:58
@PSkocik That's disappointing, I know, but unless you come up with a better fix, the [results](https://wandbox.org/permlink/FIzStLkatsc9Ns0m) are not correct. – Bob__ Feb 14 '20 at 16:12
@Bob__ Looks like Jens Gustedt has a typo there and meant to write `a[0] = (res > M) ? -(int)(-AB) : (int)AB;`, which after my adjustment would be `a[0] = AB>M ? ((unsigned)INT_MIN==AB ? INT_MIN : -(int)-AB) : (int)AB;`. This should give you the same results in `a[0]` if there's no overflow, but it will give you different ones if there is overflow, because then it will uncoditionally store the two's complement-overflown value in `a[0]`--the unconditional store is the whole essence of it being a branchless implementation. – Petr Skocik Feb 14 '20 at 16:27
@PSkocik, yes you are completely right. And this changes the interface specification a bit :( Well I think it is time for me to stop. – Jens Gustedt Feb 14 '20 at 16:35
I don't mind if it stores something in the overflow case. It would be nice to match `__builtin_add_overflow()` like that actually. – Tavian Barnes Feb 14 '20 at 17:19

Alexander Cherepanov · Answer 2 · 2020-02-18T23:09:10.117

The situation with signed operations is much worse than with unsigned ones, and I see only one pattern for signed addition, only for clang and only when a wider type is available:

int safe_add(int *value, int delta)
{
    long long result = (long long)*value + delta;

    if (result > INT_MAX || result < INT_MIN) {
        return -1;
    } else {
        *value = result;
        return 0;
    }
}

clang gives exactly the same asm as with __builtin_add_overflow:

safe_add:                               # @safe_add
        addl    (%rdi), %esi
        movl    $-1, %eax
        jo      .LBB1_2
        movl    %esi, (%rdi)
        xorl    %eax, %eax
.LBB1_2:
        retq

Otherwise, the simplest solution I can think of is this (with the interface as Jens used):

_Bool overadd(int a[static 1], int b)
{
    // compute the unsigned sum
    unsigned u = (unsigned)a[0] + b;

    // convert it to signed
    int sum = u <= -1u / 2 ? (int)u : -1 - (int)(-1 - u);

    // see if it overflowed or not
    _Bool overflowed = (b > 0) != (sum > a[0]);

    // return the results
    a[0] = sum;
    return overflowed;
}

gcc and clang generate very similar asm. gcc gives this:

overadd:
        movl    (%rdi), %ecx
        testl   %esi, %esi
        setg    %al
        leal    (%rcx,%rsi), %edx
        cmpl    %edx, %ecx
        movl    %edx, (%rdi)
        setl    %dl
        xorl    %edx, %eax
        ret

We want to compute the sum in unsigned, so unsigned have to be able to represent all values of int without any of them sticking together. To easily convert the result from unsigned to int, the opposite is useful too. Overall, two's complement is assumed.

On all popular platforms I think we can convert from unsigned to int by a simple assignment like int sum = u; but, as Jens mentioned, even the latest variant of the C2x standard allows it to raise a signal. The next most natural way is to do something like that: *(unsigned *)&sum = u; but non-trap variants of padding apparently could differ for signed and unsigned types. So the example above goes the hard way. Fortunately, both gcc and clang optimize this tricky conversion away.

P.S. The two variants above could not be compared directly as they have different behavior. The first one follows the original question and doesn't clobber the *value in case of overflow. The second one follows the answer from Jens and always clobbers the variable pointed to by the first parameter but it's branchless.

Replaced equality by xor in the overflow check to get better asm with gcc. Added asm. — Alexander Cherepanov, Feb 18 '20 at 21:31

score 1 · Answer 3 · answered Feb 13 '20 at 17:55

1

the best version I can come up with is:

int safe_add(int *value, int delta) {
    long long t = *value + (long long)delta;
    if (t != ((int)t))
        return -1;
    *value = (int) t;
    return 0;
}

which produces:

safe_add(int*, int):
    movslq  %esi, %rax
    movslq  (%rdi), %rsi
    addq    %rax, %rsi
    movslq  %esi, %rax
    cmpq    %rsi, %rax
    jne     .L3
    movl    %eax, (%rdi)
    xorl    %eax, %eax
    ret
.L3:
    movl    $-1, %eax
    ret

answered Feb 13 '20 at 17:55

Iłya Bursov

23,342
4
33
57

I'm surprised even that doesn't use the overflow flag. Still much better than the explicit range checks, but it doesn't generalize to adding long longs. – Tavian Barnes Feb 13 '20 at 18:25
@TavianBarnes you're right, unfortunately there is no good way to use overflow flags in c (except compiler specific builtins) – Iłya Bursov Feb 13 '20 at 18:32
1

This code suffers from signed overflow, which is Undefined Behaviour. – emacs drives me nuts Feb 14 '20 at 08:58
@emacsdrivesmenuts, you are right, the cast in the comparisson can overflow. – Jens Gustedt Feb 14 '20 at 14:29
1

@emacsdrivesmenuts The cast is not undefined. When out of the range of `int`, a cast from a wider type will either produce an implementation-defined value or raise a signal. All implementations I care about define it to preserve the bit pattern which does the right thing. – Tavian Barnes Feb 14 '20 at 15:32
At least with GCC you should use `-fwrapv`. https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#index-fwrapv There **are** situations where GCC makes use of signed overflow. Been there, seen it, bought the T-shirt. – emacs drives me nuts Feb 14 '20 at 15:44
@emacsdrivesmenuts That flag doesn’t seem to be relevant for casting, only for actual operations on integers leading to overflow. See https://godbolt.org/z/ZJKZTM. – Konrad Rudolph Feb 14 '20 at 16:49

Konrad Rudolph · Answer 4 · 2020-02-14T15:36:20.097

0

I could get the compiler to use the sign flag by assuming (and asserting) a two’s complement representation without padding bytes. Such implementations should yield the required behaviour in the line annotated by a comment, although I can’t find a positive formal confirmation of this requirement in the standard (and there probably isn’t any).

Note that the following code only handles positive integer addition, but can be extended.

int safe_add(int* lhs, int rhs) {
    _Static_assert(-1 == ~0, "integers are not two's complement");
    _Static_assert(
        1u << (sizeof(int) * CHAR_BIT - 1) == (unsigned) INT_MIN,
        "integers have padding bytes"
    );
    unsigned value = *lhs;
    value += rhs;
    if ((int) value < 0) return -1; // impl. def., 6.3.1.3/3
    *lhs = value;
    return 0;
}

This yields on both clang and GCC:

safe_add:
        add     esi, DWORD PTR [rdi]
        js      .L3
        mov     DWORD PTR [rdi], esi
        xor     eax, eax
        ret
.L3:
        mov     eax, -1
        ret

edited Feb 14 '20 at 15:36

answered Feb 14 '20 at 14:18

Konrad Rudolph

530,221
131
937
1,214

I think that the cast in the comparision is undefined. But you could get away with this as I do in my answer. But then also, all the fun is in being able to cover all the cases. Your `_Static_assert` serves not much of a purpose, because this is trivially true on any current architecture, and will even be imposed for C2x. – Jens Gustedt Feb 14 '20 at 14:31
2

@Jens Actually it seems the cast is implementation-defined, not undefined, if I am reading (ISO/IEC 9899:2011) 6.3.1.3/3 correctly. Can you double-check that? (However, extending this to negative arguments makes the whole thing rather convoluted and ultimately similar to your solution.) – Konrad Rudolph Feb 14 '20 at 15:01
1

You are right, it is iplementation defined, but may also raise a signal :( – Jens Gustedt Feb 14 '20 at 15:10
@Jens Yeah, technically I guess a two’s complement implementation might still contain padding bytes. Maybe the code should test for this by comparing the theoretical range to `INT_MAX`. I’ll edit the post. But then again I don’t think this code should be used in practice anyway. – Konrad Rudolph Feb 14 '20 at 15:20

Is there a C snippet that computes overflow-safe addition efficiently without using compiler builtins?

4 Answers4