2

In Java 6, there was a package private constructor to return a new String with the offset being changed.

643     // Package private constructor which shares value array for speed.
644     String(int offset, int count, char value[]) {
645         this.value = value;
646         this.offset = offset;
647         this.count = count;
648     }

It has been marked deprecated in Java 7 and being removed in Java 8. I was relying on some API that made calls to subSequence repeatedly, until I experienced a performance problem.

Digging into the code I saw that subSequence were using this constructor in Java 6. But as for now, it uses another one that copies the underlying array and start at the desired and ended offset, thus making it an O(1) operation to O(n).

Replacing the problematic call to subSequence has increased the performance by a factor of 10.

I was wondering why such a change was made. The only thing I am thinking of is that it could create potential memory leaks, for example:

String veryLargeString = ....;
String target = veryLargeString.substring(0, 10);
//assume I don't need anymore veryLargeString at this point

At that point the underlying char array can't be GC because it is still used by the target String. Thus you have in memory a large array but you only need the first 10 values of it.

Is this the only good use case or is there other reasons why this constructor has been removed?

user2336315
  • 15,697
  • 10
  • 46
  • 64
  • I think part of the thinking could've been that in the rare cases this does cause a problem you can work around it by using your own class wrapping a `char[]`. Not ideal but it's more straightforward than chasing down obscure memory leaks. – biziclop Jun 09 '15 at 18:19
  • 1
    @biziclop: the simplest O(1) implementation, without rolling your own anything, is `CharBuffer.wrap(string).subSequence(start, end)`, which yields you a `CharSequence` in O(1). – Louis Wasserman Jun 09 '15 at 18:23
  • The main motivation, I believe, is the eventual "co-location" of `String` and its `char[]`. Rightnow they locate in a distance, which is a major penalty on cache lines. If every `String` owns its `char[]`, JVM can merge them together, and reading will be much faster. – ZhongYu Jun 09 '15 at 18:51

1 Answers1

3

Yes, String was changed significantly in Java 7 update 6 - now separate String objects never share an underlying char[]. This was definitely a trade-off:

  • Strings no longer need to maintain an offset and length (saving two fields per instance; not a lot, but it is for every string...)
  • One small string can't end up keeping a huge char[] alive (as per your post)
  • ... but operations that were previously cheap now end up creating copies

In some use cases, the previous code would work better - in others, the new code would work better. It sounds like you're in the first camp, unfortunately. I can't imagine this decision was made lightly, however - I suspect that a lot of tests against different common work loads were made.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194