254

I was reading Java's ArrayList source code and noticed some comparisons in if-statements.

In Java 7, the method grow(int) uses

if (newCapacity - minCapacity < 0)
    newCapacity = minCapacity;

In Java 6, grow didn't exist. The method ensureCapacity(int) however uses

if (newCapacity < minCapacity)
    newCapacity = minCapacity;

What was the reason behind the change? Was it a performance issue or just a style?

I could imagine that comparing against zero is faster, but performing a complete subtraction just to check whether it's negative seems a bit overkill to me. Also in terms of bytecode, this would involve two instructions (ISUB and IF_ICMPGE) instead of one (IFGE).

SpringLearner
  • 13,738
  • 20
  • 78
  • 116
dejvuth
  • 6,986
  • 3
  • 33
  • 36
  • 35
    @Tunaki How is `if (newCapacity - minCapacity < 0)` better than `if (newCapacity < minCapacity)` in terms of preventing overflow? – Eran Oct 15 '15 at 11:34
  • 3
    I wonder whether the mentioned sign overflow is indeed the reason. The subtraction seems more a candidate for overflow. The component maybe saying "this will nevertheless not overflow", maybe both variables being non-negative. – Joop Eggen Oct 15 '15 at 11:43
  • 12
    FYI, you believe that doing a comparison is faster than performing a "complete subtraction". In my experience, at the machine code level, usually comparisons are done by performing a subtraction, throwing away the result, and checking the resulting flags. – David Dubois Oct 15 '15 at 13:56
  • 6
    @David Dubois: the OP didn’t assume that comparison is faster than subtraction, but that comparison *with zero* might be faster than a comparison of two arbitrary values and also correctly assumes that this doesn’t hold when you are performing an actual subtraction first in order to get a value to compare with zero. That’s all quite reasonable. – Holger Oct 15 '15 at 16:02

4 Answers4

286

a < b and a - b < 0 can mean two different things. Consider the following code:

int a = Integer.MAX_VALUE;
int b = Integer.MIN_VALUE;
if (a < b) {
    System.out.println("a < b");
}
if (a - b < 0) {
    System.out.println("a - b < 0");
}

When run, this will only print a - b < 0. What happens is that a < b is clearly false, but a - b overflows and becomes -1, which is negative.

Now, having said that, consider that the array has a length that is really close to Integer.MAX_VALUE. The code in ArrayList goes like this:

int oldCapacity = elementData.length;
int newCapacity = oldCapacity + (oldCapacity >> 1);
if (newCapacity - minCapacity < 0)
    newCapacity = minCapacity;
if (newCapacity - MAX_ARRAY_SIZE > 0)
    newCapacity = hugeCapacity(minCapacity);

oldCapacity is really close to Integer.MAX_VALUE so newCapacity (which is oldCapacity + 0.5 * oldCapacity) might overflow and become Integer.MIN_VALUE (i.e. negative). Then, subtracting minCapacity underflows back into a positive number.

This check ensures that the if is not executed. If the code were written as if (newCapacity < minCapacity), it would be true in this case (since newCapacity is negative) so the newCapacity would be forced to minCapacity regardless of the oldCapacity.

This overflow case is handled by the next if. When newCapacity has overflowed, this will be true: MAX_ARRAY_SIZE is defined as Integer.MAX_VALUE - 8 and Integer.MIN_VALUE - (Integer.MAX_VALUE - 8) > 0 is true. The newCapacity is therefore rightly handled: hugeCapacity method returns MAX_ARRAY_SIZE or Integer.MAX_VALUE.

NB: this is what the // overflow-conscious code comment in this method is saying.

Tunaki
  • 132,869
  • 46
  • 340
  • 423
  • 8
    Good demo on the difference between math and CS – piggybox Oct 16 '15 at 18:47
  • 36
    @piggybox I wouldn't say so. This is math. It's just not math in Z, but in a version of the integers modulo 2^32 (with the canonical representations chosen differently than usual). It's a proper mathematical system, not just "lol computers and their quirks". – harold Oct 17 '15 at 11:18
  • 2
    I would've written code that didn't overflow at all. – Aleksandr Dubinsky Oct 20 '15 at 22:06
  • IIRC processors implement a less-than instruction on signed integers by doing `a - b` and checking if the top bit is a `1`. How do they handle overflow? – Ky - Oct 22 '15 at 18:53
  • 2
    @BenC.R.Leggiero x86, among others, track various conditions through status flags in a separate register for use with conditional instructions. This register has separate bits for the sign of the result, zeroness of the result, and whether over-/underflow occurred in the last arithmetical operation. –  Nov 11 '15 at 00:14
105

I found this explanation:

On Tue, Mar 9, 2010 at 03:02, Kevin L. Stern wrote:

I did a quick search and it appears that Java is indeed two's complement based. Nonetheless, please allow me to point out that, in general, this type of code worries me since I fully expect that at some point someone will come along and do exactly what Dmytro suggested; that is, someone will change:

if (a - b > 0)

to

if (a > b)

and the entire ship will sink. I, personally, like to avoid obscurities such as making integer overflow an essential basis for my algorithm unless there is a good reason to do so. I would, in general, prefer to avoid overflow altogether and to make the overflow scenario more explicit:

if (oldCapacity > RESIZE_OVERFLOW_THRESHOLD) {
   // Do something
} else {
  // Do something else
}

It's a good point.

In ArrayList we cannot do this (or at least not compatibly), because ensureCapacity is a public API and effectively already accepts negative numbers as requests for a positive capacity that cannot be satisfied.

The current API is used like this:

int newcount = count + len;
ensureCapacity(newcount);

If you want to avoid overflow, you would need to change to something less natural like

ensureCapacity(count, len);
int newcount = count + len;

Anyway, I'm keeping the overflow-conscious code, but adding more warning comments, and "out-lining" huge array creation so that ArrayList's code now looks like:

/**
 * Increases the capacity of this <tt>ArrayList</tt> instance, if
 * necessary, to ensure that it can hold at least the number of elements
 * specified by the minimum capacity argument.
 *
 * @param minCapacity the desired minimum capacity
 */
public void ensureCapacity(int minCapacity) {
    modCount++;

    // Overflow-conscious code
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}

/**
 * The maximum size of array to allocate.
 * Some VMs reserve some header words in an array.
 * Attempts to allocate larger arrays may result in
 * OutOfMemoryError: Requested array size exceeds VM limit
 */
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

/**
 * Increases the capacity to ensure that it can hold at least the
 * number of elements specified by the minimum capacity argument.
 *
 * @param minCapacity the desired minimum capacity
 */
private void grow(int minCapacity) {
    // Overflow-conscious code
    int oldCapacity = elementData.length;
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);

    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);
}

private int hugeCapacity(int minCapacity) {
    if (minCapacity < 0) // overflow
        throw new OutOfMemoryError();
    return (minCapacity > MAX_ARRAY_SIZE) ?
        Integer.MAX_VALUE :
        MAX_ARRAY_SIZE;
}

Webrev regenerated.

Martin

In Java 6, if you use the API as:

int newcount = count + len;
ensureCapacity(newcount);

And newCount overflows (this becomes negative), if (minCapacity > oldCapacity) will return false and you may mistakenly assume that the ArrayList was increased by len.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Eran
  • 387,369
  • 54
  • 702
  • 768
  • 2
    Nice idea but it is contradicted by [the implementation of `ensureCapacity`](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/ArrayList.java#178); if `minCapacity` is negative, you never get to that point—it’s just as silently ignored as the complicated implementation pretends to prevent. So “we cannot do this” for public API compatibility is a strange argument as they already did. The only callers relying on this behavior are the internal ones. – Holger Oct 15 '15 at 11:58
  • 1
    @Holger If `minCapacity` is very negative (i.e. resulted from `int` overflow when adding the current size of the ArrayList to the number of elements you wished to add), `minCapacity - elementData.length` to overflow again and become positive. That's how I understand it. – Eran Oct 15 '15 at 12:06
  • 1
    @Holger However, they changed it again in Java 8, to `if (minCapacity > minExpand)`, which I don't understand. – Eran Oct 15 '15 at 12:07
  • Yes, the two `addAll` methods are the only case where it is relevant as the sum of current size and number of new elements can overflow. Nevertheless, these are internal calls and the argument “we can’t change it because `ensureCapacity` is a public API” is a strange argument when in fact, `ensureCapacity` does ignore negative values. The Java 8 API didn’t change that behavior, all it does is ignoring capacities below the default capacity when the `ArrayList` is in it’s initial state (i.e. initialized with default capacity and still empty). – Holger Oct 15 '15 at 12:12
  • In other words, the reasoning about `newcount = count + len` is correct when it comes to the internal usage, however, it does not apply to the `public` method `ensureCapacity()`… – Holger Oct 15 '15 at 12:16
19

Looking at the code:

int newCapacity = oldCapacity + (oldCapacity >> 1);

If oldCapacity is quite large, this will overflow, and newCapacity will be a negative number. A comparison like newCapacity < oldCapacity will incorrectly evaluate true and the ArrayList will fail to grow.

Instead, the code as written (newCapacity - minCapacity < 0 returns false) will allow the negative value of newCapacity to be further evaluated in the next line, resulting in recalculating newCapacity by invoking hugeCapacity (newCapacity = hugeCapacity(minCapacity);) to allow for the ArrayList to grow up to MAX_ARRAY_SIZE.

This is what the // overflow-conscious code comment is trying to communicate, though rather obliquely.

So, bottom line, the new comparison protects against allocating an ArrayList larger than the predefined MAX_ARRAY_SIZE while allowing it to grow right up to that limit if needed.

Erick G. Hagstrom
  • 4,873
  • 1
  • 24
  • 38
2

The two forms behave exactly the same unless the expression a - b overflows, in which case they are opposite. If a is a large negative, and b is a large positive, then (a < b) is clearly true, but a - b will overflow to become positive, so (a - b < 0) is false.

If you're familiar with x86 assembly code, consider that (a < b) is implemented by a jge, which branches around the body of the if statement when SF = OF. On the other hand, (a - b < 0) will act like a jns, which branches when SF = 0. Hence, these behave differently precisely when OF = 1.

Doradus
  • 938
  • 1
  • 9
  • 15