0

Browsing through some JDK source code I found the following implementations of Collection.toArray() and toArray(T[]) in AbstractCollection:

public Object[] toArray() {
    // Estimate size of array; be prepared to see more or fewer elements
    Object[] r = new Object[size()];
    Iterator<E> it = iterator();
    for (int i = 0; i < r.length; i++) {
        if (! it.hasNext()) // fewer elements than expected
            return Arrays.copyOf(r, i);
        r[i] = it.next();
    }
    return it.hasNext() ? finishToArray(r, it) : r;
}

public <T> T[] toArray(T[] a) {
    // Estimate size of array; be prepared to see more or fewer elements
    int size = size();
    T[] r = a.length >= size ? a :
              (T[])java.lang.reflect.Array
              .newInstance(a.getClass().getComponentType(), size);
    Iterator<E> it = iterator();

    for (int i = 0; i < r.length; i++) {
        if (! it.hasNext()) { // fewer elements than expected
            if (a == r) {
                r[i] = null; // null-terminate
            } else if (a.length < i) {
                return Arrays.copyOf(r, i);
            } else {
                System.arraycopy(r, 0, a, 0, i);
                if (a.length > i) {
                    a[i] = null;
                }
            }
            return a;
        }
        r[i] = (T)it.next();
    }
    // more elements than expected
    return it.hasNext() ? finishToArray(r, it) : r;
}

private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

@SuppressWarnings("unchecked")
private static <T> T[] finishToArray(T[] r, Iterator<?> it) {
    int i = r.length;
    while (it.hasNext()) {
        int cap = r.length;
        if (i == cap) {
            int newCap = cap + (cap >> 1) + 1;
            // overflow-conscious code
            if (newCap - MAX_ARRAY_SIZE > 0)
                newCap = hugeCapacity(cap + 1);
            r = Arrays.copyOf(r, newCap);
        }
        r[i++] = (T)it.next();
    }
    // trim if overallocated
    return (i == r.length) ? r : Arrays.copyOf(r, i);
}

private static int hugeCapacity(int minCapacity) {
    if (minCapacity < 0) // overflow
        throw new OutOfMemoryError
            ("Required array size too large");
    return (minCapacity > MAX_ARRAY_SIZE) ?
        Integer.MAX_VALUE :
        MAX_ARRAY_SIZE;
}

However, since I am now 'default-implementing' a Collection-Interface, I wonder if the implementation of these methods can't be simplified in readability as follows, without having more iterations, array creations or generally worse performance:

@Override
default Object[] toArray() {
    Object[] result = new Object[size()];
    Iterator<E> it = iterator();
    int i = 0;
    while (it.hasNext()) {
        if (i == result.length) // more objects than expected, resize
            result = Arrays.copyOf(result, newSize(i));
        result[i++] = it.next();
    }
    if (i != result.length) // trim array
        result = Arrays.copyOf(result, i);
    return result;
}

@Override
default <T> T[] toArray(T[] a) {
    int size = size();
    T[] result = size > a.length ? (T[]) Array.newInstance(a.getClass().getComponentType(), size) : a;
    Iterator<E> it = iterator();
    int i = 0;
    while (it.hasNext()) {
        if (i == result.length) // more objects than expected, resize
            result = Arrays.copyOf(result, newSize(i));
        result[i++] = (T) it.next();
    }
    if (i != result.length)
        if (result == a) // set next element to null
            a[i] = null;
        else // trim array
            result = Arrays.copyOf(result, i);
    return result;
}

private static int newSize(int currentSize) {
    if (currentSize == Integer.MAX_VALUE)
        throw new OutOfMemoryError("Required array size too large");
    int newSize = currentSize + (currentSize >> 1) + 1;
    if (newSize < 0)
        // overflow - exceed MAX_ARRAY_SIZE from AbstractCollection only if inevitable
        return Math.max(Integer.MAX_VALUE - 8, currentSize + 1);
    // clip at MAX_ARRAY_SIZE from AbstractCollection
    return Math.min(newSize, Integer.MAX_VALUE - 8);
}

I know that 'simplifying' always is subjective to a certain degree but I am especially referring to the many levels of indentation and the second step of thinking when finishToArray(T[], Iterator<?>) is called in the JDK implementation, maybe you get the point.

Since this code was written by Neal Gafter and Joshua Bloch it would be quite presumptuous and precipitate to just call my solution better so I'd like to have your feedback.
Also my code is not tested so if you find some special cases which result in bugs, I'd also like to know them as well.


Edit, since detailed description on what has changed has been requested:
The main point of simplification was in exchanging the for loop from 0 to r.length (being initial size()) to a while loop using the Collection's Iterator, so the loop only ends, when there is actually no more element left, and thus making a finishToArray(..) method unnecesary.
Also the rather complex body in toArray(T[] a)'s for loop has been highly simplified from several if-branches including return statements to just a simple if-branch resizing the array if necessary.

These are the reasons why I wonder if the JDK implementation offers a certain advantage or if mine is somehow wrong in a way I haven't tought about.

Reizo
  • 1,374
  • 12
  • 17
  • I'm voting to close this question as off-topic because this question belong on another site in the Stack Exchange netork: [Code Review](https://codereview.stackexchange.com/) – krokodilko Nov 15 '17 at 20:26
  • 3
    It's unclear what you've changed. Please explain, rather than just dumping code. – Andy Turner Nov 15 '17 at 20:27
  • There's a good reason for keeping `finishToArray` out: It hardly ever gets called and the JIT is better on optimizing short methods. – maaartinus Nov 16 '17 at 08:46
  • @krokodilko I literally posted this on Code Review _before_ but the first commenter there said it was off-topic, since Code Review is about code that I 'actually maintan' and 'not example code'. – Reizo Nov 16 '17 at 11:48
  • @AndyTurner I added a description on what has changed and my according considerations. – Reizo Nov 16 '17 at 12:12
  • @maaartinus I cannot assess the behaviour of the JIT at all. Considering method length I can't find my `toArray(T[] a)` implementation to be any longer/complex than the JDK's. However, since you're mentioning it, I suppose that the test `i < r.length` can be optimized better than `it.hasNext()`, isn't it? – Reizo Nov 16 '17 at 12:21
  • I'd have to create the files from your snippet and compare them side by side - too lazy. Comparing files here is rather impossible. In case, you're really interested: 1. Create a class implementing `toArray(Collection)` in the JDK way. 2. Do the same with your approach. 3. Create a project on github. 4. Post a question to [CR](https://codereview.stackexchange.com/questions/tagged/java) containing *both* the code and a link to github (just click on edit here to get what you wrote). 5. Add a comment mentioning me to this question and delete the question - it gets closed anyway :). – maaartinus Nov 16 '17 at 20:33
  • @maaartinus Thanks for your willingness. [This](https://github.com/Reizo-prime/to-array-implementation) is the github project. To be honest I don't understand why I should open a new question on codereview, tho. – Reizo Nov 17 '17 at 17:25
  • I would say a for loop can be optimized much better by a JIT than a while loop. While loop has open end but a for loop has a definite number of calls. If JIT detects that the end of the for loop (r.lenght) is a fixed number it can even duplicate the code. But it.hasNext() may have any side effect so JIT has to take care. – brummfondel Nov 17 '17 at 20:10
  • @Reizo The question here had already three close votes and I thought it gets closed soon. Now, it still has three close votes, so it might survive. I guess., CR may be better suited, but in no case, I agree with the question being off-topic here. Your github project looks good at the first glance. I hope, I'll get time to comment on in a few hours. – maaartinus Nov 17 '17 at 22:25
  • @brummfondel You can rewrite any `for` loop into an equivalent `while` loop and get the same performance - AFAIK there's no such distinction in the internal representation (which is a graph created from the bytecode). But it's true that counted loops get some special treatment and it may really matter (a counted loop is a loop with a number of iterations limited by an `int` counter, no matter how it's written in the source code). – maaartinus Nov 17 '17 at 22:38

0 Answers0