3

I was looking at the implementation of Apache Commons' StringUtils.join method and stumbled upon a line I assume is thought for performance but I don't understand why they did it the way it is, with those specific values.

Here's the implementation:

public static String join(Object[] array, String separator, int startIndex, int endIndex) {
    if (array == null) {
        return null;
    }
    if (separator == null) {
        separator = EMPTY;
    }

    // endIndex - startIndex > 0:   Len = NofStrings *(len(firstString) + len(separator))
    //           (Assuming that all Strings are roughly equally long)
    int noOfItems = (endIndex - startIndex);
    if (noOfItems <= 0) {
        return EMPTY;
    }

    StringBuilder buf = new StringBuilder(noOfItems * 16); // THE QUESTION'S ABOUT THIS LINE

    for (int i = startIndex; i < endIndex; i++) {
        if (i > startIndex) {
            buf.append(separator);
        }
        if (array[i] != null) {
            buf.append(array[i]);
        }
    }
    return buf.toString();
}

My questions regard the StringBuilder buf = new StringBuilder(noOfItems * 16); line:

  • I assume giving the StringBuilder an initial capacity targets performance so less resizing is needed while building the string. My question is: how much does these resizing operations actually hurt performance? Does this strategy really improve efficiency in terms of speed? (Because in term of space it could even be negative if more space than necessary is allocated)
  • Why is the magic number 16 being used? Why would they assume each String in the array would be 16 characters long? What good does this guess do?
dabadaba
  • 9,064
  • 21
  • 85
  • 155
  • 1
    I don't know, but I would guess that the 16 is just a guess of the average expected size. Sounds about right for the use cases I've generally needed it for. Keep in mind that the StringBuilder will be GC'd in a bit anyway, so it doesn't matter if it's a bit too big. Saving on resizings is nice, because resizings require copying over the whole previous array; at worst case, if you resize each time then you have o(n^2) performance. – yshavit May 16 '16 at 12:28

3 Answers3

1

16 is a slight over-estimate (presumably based on experience/statistics) of the expected average size of the strings-with-separator.

Pre-allocating enough space to hold the entire result avoids replacing the backing array during execution with a larger (double the size) array and copying over the elements (which is an O(n) operation).

Over estimating, even by quite a bit, to allocate a larger array is worth the cost if it avoids the replacement operation in most situations.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
0

Really... It's not a only 16 as hard coded you say in your Question .

If you look into the definition again. You will found something like this.

bufSize *= ((array[startIndex] == null ? 16 : array[startIndex].toString().length())
                        + separator.length());  
     //16 will only assigned if Object array at position StartIndex contains null.

        StringBuffer buf = new StringBuffer(bufSize); //if null then default memory allocation for String Buffer will be 16 only.

Here StringBuffer will call the constructor which appriviates as

     new StringBuffer(int Capacity);
Constructs a string buffer with no characters in it and the specified initial capacity.

If Object Array contains the element of at index startIndex then default memomy allocation will be length of that Object.

Thank you.

Vikrant Kashyap
  • 6,398
  • 3
  • 32
  • 52
0

hmm.. StringUtils.join make OutOfMemory Exception in big arrays...; You know this case.

seunggabi
  • 1,699
  • 12
  • 12