12

Is this always technically correct:

unsigned abs(int n)
{
    if (n >= 0) {
        return n;
    } else {
        return -n;
    }
}

It seems to me that here if -INT_MIN > INT_MAX, the "-n" expression could overflow when n == INT_MIN, since -INT_MIN is outside the bounds. But on my compiler this seems to work ok... is this an implementation detail or a behaviour that can be relied upon?

Longer version

A bit of context: I'm writing a C++ wrapper for the GMP integer type (mpz_t) and taking inspiration for the existing GMP C++ wrapper (called mpz_class). When handling addition of mpz_t with signed integers there is code like this:

static void eval(mpz_ptr z, signed long int l, mpz_srcptr w)
{
  if (l >= 0)
    mpz_add_ui(z, w, l);
  else
    mpz_sub_ui(z, w, -l);
}

In other words, if the signed integer is positive, add it using the routine of unsigned addition, if the signed integer is negative add it using the routine of unsigned subtraction. Both *_ui routines take unsigned long as last arguments. Is the expression

-l

at risk of overflowing?

bluescarni
  • 3,937
  • 1
  • 22
  • 33

7 Answers7

11

If you want to avoid the overflow, you should first cast n to an unsigned int and then apply the unary minus to it.

unsigned abs(int n) {
  if (n >= 0)
    return n;
  return -((unsigned)n);
}

In your original code the negation happens before the type conversion, so the behavior is undefined if n < -INT_MAX.

When negating an unsigned expression, there will never be overflow. Instead the result will be modulo 2^x, for the appropriate value of x.

Roland Illig
  • 40,703
  • 10
  • 88
  • 121
  • I'm not sure I understand this fully... Does this behaviour rely on two's complement? – bluescarni Dec 27 '10 at 01:39
  • 2
    No it doesn't. It works in any environment that conforms to ISO C90 or to ISO C99, and neither of these standards requires two's complement arithmetics. The trick is to avoid any dependency on negative integers by computing the interesting case completely in unsigned arithmetics. – Roland Illig Dec 27 '10 at 01:53
  • 1
    Ok, maybe I'm slowly understanding this... Let me try: 1) after the cast, the unsigned value is congruent modulo 2**nbits to the original value 2) with the minus operator another modulo operation is performed – bluescarni Dec 27 '10 at 02:20
  • 2
    Ok now I got also the minus part, quoting from the C++ standard: "The negative of an unsigned quantity is computed by subtracting its value from 2**n , where n is the number of bits in the promoted operand". – bluescarni Dec 27 '10 at 02:25
  • Err, is the behavior of casting negative ints to unsigned any more guaranteed than just allowing the overflow? – ysth Dec 27 '10 at 02:26
  • 2
    Apparently so, at least in C++ (4.7.2): "If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2**n where n is the number of bits used to represent the unsigned type)". – bluescarni Dec 27 '10 at 02:29
  • Then this is the way to go, apparently. – ysth Dec 27 '10 at 03:03
3

There is no such thing as an overflow of unsigned integers in C. Arithmetic for them is clearly defined as computation modulo their max+1, they may "wrap" but technically this is not considered overflow. So the conversion part of your code is fine, although in extreme cases you might encounter surprising results.

The only point where you could have overflow in your code is the - of a signed type. There is exactly one value for signed types that might not have a positive counterpart, the minimum value. In fact for that you'd have to do a special check, e.g for int

if (INT_MIN < -INT_MAX && n == INT_MIN ) /*do something special*/
Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
2

Most computers today use a two complement number scale, which means the negative part is one larger than the positive, for example from -128 to 127. That means if you can represent the positive number the negative number you can represent the negative number without worry.

dutt
  • 7,909
  • 11
  • 52
  • 85
  • 1
    I think he's asking about the opposite case; namely, whether converting a given negative number to a positive one might overflow in some cases. – Justin Spahr-Summers Dec 27 '10 at 01:04
  • 1
    Doesn't this mean that when doing abs(-128), it will try to build the integer +128, which is not representable? – bluescarni Dec 27 '10 at 01:04
  • @bluescami: yes, and the +128 (in this imaginary 8-bit int system) overflows to -128. – ysth Dec 27 '10 at 01:16
  • But as far as I remember, signed integer overflowing is undefined behaviour in C/C++? – bluescarni Dec 27 '10 at 01:35
  • @Justin: Ah, should've read the question more carefully then. Owell I hope he learnt something from the answer in any case. – dutt Dec 27 '10 at 01:40
  • @bluescami: Hm, I think it's defined to wrap, -129 becomes +127 and +128 becomes -128 etc. Might be mistaken though. – dutt Dec 27 '10 at 01:42
0

This should avoid undefined behavior and work with all representations of signed int (2's complement, 1's complement, sign and magnitude):

unsigned myabs(int v)
{
  return (v >= 0) ? (unsigned)v : (unsigned)-(v+1)+1;
}

Modern compilers are able to remove the redundant -1+1 and recognize the idiom for computing the absolute value of a signed integer.

Here's what gcc produces:

_myabs:
    movl    4(%esp), %eax
    cltd
    xorl    %edx, %eax
    subl    %edx, %eax
    ret
Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
0

Maybe it could cope with the symmetrical range of 2's-complement numbers:

#include <limits.h>

unsigned int abs(int n){

  unsigned int m;

  if(n == INT_MIN)
    m = INT_MAX + 1UL;
  else if(n < 0)
    m = -n;
  else 
    m = n;

  return m;
}
bruce
  • 1
  • 1
  • This would work assuming that _MAX and _MIN differ at most by 1 (but of course can be generalised). – bluescarni Dec 27 '10 at 02:28
  • 3
    They do differ by at most one. C allows only 3 possible choices of signed representation: twos complement, ones complement, and sign/magnitude (with differences of 1, 0, and 0, respectively). – R.. GitHub STOP HELPING ICE Dec 28 '10 at 04:10
  • @R.. Thanks for the info, I meant to ask that sooner or later :) – bluescarni Dec 28 '10 at 04:16
  • @bruce: You have your types/limits mismatched. Change `LONG_MIN` to `INT_MIN` and `LONG_MAX` to `INT_MAX`. You should probably also correct the first case to use `-(unsigned)INT_MIN` instead of `INT_MAX+1UL` so it works on any representation. – R.. GitHub STOP HELPING ICE Dec 28 '10 at 06:02
  • @R.. Thank you.But I wonder the difference between 'INT_MAX + 1' and '-INT_MAX',doesn't the former work? – bruce Dec 28 '10 at 12:58
  • `-INT_MIN` is undefined behavior in the case of twos complement, because the value of applying unary negation to `INT_MIN` is greater than `INT_MAX` and thus an overflow. `INT_MAX+1` is always undefined behavior because it overflows. `INT_MAX+1U` is well-defined though. – R.. GitHub STOP HELPING ICE Dec 28 '10 at 14:12
-1

Yes, it will overflow, to itself.

#include <stdio.h>
#include <limits.h>
int main(int argc, char**argv) {
    int foo = INT_MIN;
    if (-foo == INT_MIN) printf("overflow\n");
    return 0;
}

prints "overflow"

However, this is merely typical behavior, not required by the standard. If you wish to play it safe, see the accepted answer for how.

ysth
  • 96,171
  • 6
  • 121
  • 214
  • Is this defined by the standard? – Justin Spahr-Summers Dec 27 '10 at 01:17
  • Or rather, it overflows to zero. And zero just happens to have the nice property that it is neither negative nor positive. So trying to find the negative value of zero would of course lead you straight back to zero. – slebetman Dec 27 '10 at 01:18
  • 5
    If it overflows, the behavior is undefined. – Roland Illig Dec 27 '10 at 01:22
  • @Roland Illig: got a citation for that? I also believe it is just how two's complement works. I don't know how wedded the C or C++ standards are to two's complement. – ysth Dec 27 '10 at 01:25
  • @slebetman: no, the --foo produces a legal int value; the -foo overflows. But because C regards the - as an operator, not part of the constant, "-2147483648" can provoke compliation warnings about 2147483648 not being a valid int constant. I removed the --foo so what overflows isn't confused. – ysth Dec 27 '10 at 01:27
  • ah, I missed that some thought foo became 0; clarified the code again. – ysth Dec 27 '10 at 01:33
  • 2
    I don't have a citation at hand, but I know that C doesn't require two's complement, and I think C++ follows C in this regard. When I'm at home again I can cite from ISO C99. – Roland Illig Dec 27 '10 at 01:37
  • 3
    C99 §6.5/5: "If an *exceptional condition* occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined." – Adam Rosenfield Dec 27 '10 at 01:53
-1

Very good question, which exposes the differences between C89, C99 and C++. So this is some commentary on these Standards.

In C89, where n is an int:

(unsigned)n

is not well defined for all n: there's no restriction on the conversion of signed or unsigned int except that the representation of a non-negative signed int is identical to that of an unsigned int of the same value, provided that value is representable.

This was considered a defect, and in C99, unfortunately there is a faulty attempt to restrict the encoding to two's complement, one's complement, or signed magnitude with the same number of bits. Unfortunately the C committee didn't have much mathematical knowledge and completely botched the specification: on the one hand it is ill-formed due to circular definition and therefore non-normative, and on the other hand, if you excuse this fault, it is a gross overconstraint, which, for example, excludes a BCD representation (used in C on old IBM mainframes), and also allows the programmer to hack the value of an integer by fiddling bits of the representation (which is very bad).

C++ went to some trouble to provide a better specification, however it suffers the same circular definition fault.

Roughly speaking, the representation of a value v is an array of unsigned char with sizeof(v) elements. An unsigned char has a power of two number of elements, and is required to be big enough to ensure it faithfully encodes any aliased data structure. The number of bits in an unsigned char is well defined as the binary log of the number of values representable.

The number of bits of any unsigned value is similarly well defined if it has a power of two number of values from 0 to 2^n-1, by via the canonical positional encoding scheme.

Unfortunately, the committee wanted to ask if there were any "holes" in the representation. For example could you have a 31 bit integer on a x86 machine? I say unfortunately, because this is a badly formed question, and the answer is similarly improper.

The proper way to ask this question is to ask if the representation is full. It is not possible to talk about "the bits of a representation" for signed integers because the specification does not go from the representation to the values, it goes the other way. This may confuse a lot of programmers who incorrectly think a representation is a mapping from underlying bits to some value: a representation is a mapping from the values to the bits.

A representation is full if it is a surjection, that is, it is onto the whole range of the representation space. If the representation is full then there are no "holes", that is, unused bits. However that is not all. A representation of 255 values to an array of 8 bits cannot be full, yet there are no bits which are unused. There are no holes.

The problem is this: consider an unsigned int, then there are TWO distinct bitwise representations. There is the well defined array of log base 2 bits determined from the canonical encoding, and then there is the array of bits of the physical representation given by the aliasing of an array of unsigned char. Even if this representation is full there is no correspondence between the two kinds of bits.

We all know that the "high order bits" of the logical representation can be at one end of the physical representation on some machines and the other on other machines: it's called endian-ness. But in fact there's no reason the bits couldn't be permuted in any order at all, in fact there's no reason the bits should line up at all! Just consider adding 1 modulo the maximum value plus 1 as the representation to see this.

So now the problem is that for signed integers there is no canonical logical representation, rather there are several common ones: two's complement, for example. However as above this is unrelated to the physical representation. The C committee just couldn't understand that the correspondence between the values and physical representation cannot be specified by talking about bits. It must be specified entirely by talking about the properties of functions.

Because this was not done, the C99 standard contains non-normative gibberish and consequently all of the rules for behaviour of signed and unsigned integer conversions are non-normative gibberish as well.

Therefore it is not clear that

(unsigned)n

will actually produce the desired result for negative values.

Yttrill
  • 4,725
  • 1
  • 20
  • 29
  • 4
    specifying integer representations as it was done might have been a mistake, but you're wrong here: conversion from signed to unsigned is defined in terms of values ("repeatedly adding or subtracting one more than the maximum value that can be represented in the new type") and thus well-defined – Christoph Dec 27 '10 at 12:25
  • 3
    Your rant may have merit, but the conclusion is wrong. The standard absolutely specifies the result of the conversion to unsigned as reduction modulo one plus the maximum possible value in the destination type. – R.. GitHub STOP HELPING ICE Dec 28 '10 at 04:13