7

This is a question that we have had trouble understanding. It's tricky to describe it using text but I hope that the gist will be understood.

I understand that a string's actual content is enclosed in an internal char array. In normal instances the retained heap size of the string will include 40 bytes plus the size of the character array. This is explained here. When calling a substring the character array retains a reference to the original string and therefore the retained size of the character array could be a lot bigger than the string itself.

However when profiling memory usage using Yourkit or MAT something strange seems to happen. The string that references the char array's retained size does not include the retained size of the character array.

An example could be as follows (semi pseudo-code):

String date = "2011-11-33"; (24 bytes)
date.value = char{1172}; (2360 bytes)

The string's retained size is defined as 24 bytes without including the character array's retained size. This could make sense if there are a lot of references to the character array due to many substring operations.

Now when this string is included in some type of collection such as an array or list then the retained size of this array will include the retained size of all the strings including the character array's retained size.

We then have a situation like this:

Array's retained size = 300 bytes
array[0] = String 40 bytes;
array[1] = String 40 bytes;
array[1].value = char[] (220 bytes)

You therefore have to look into each array entry to try to work out where the retained size comes from.

Again this can be explained in that the array holds all the strings that hold references to the same character array and therefore altogether the array's retained size is correct.

Now we get to the problem.

I keep in a separate object a reference to the array that I discussed above as well as a different array with the same strings. In both arrays the strings refer to the same character array. This is expected - after all we are talking about the same string. However the retained size of this character array is counted for both arrays in this new object. In other words the retained size seems to be double. If I delete the first array then the second array will still hold a reference to the character array and vice versa. This causes a confusion in that it seems that java is holding two separate references to the same character array. How can this be? Is this a problem with java's memory or is it just the way that the profilers display information?

This problem caused a lot of headaches for us in trying to track down huge memory usage in our application.

Again - I hope that someone out there will be able to understand the question and explain it.

Thanks for your help

slbruce
  • 103
  • 1
  • 4

4 Answers4

4

I keep in a separate object a reference to the array that I discussed above as well as a different array with the same strings. In both arrays the strings refer to the same character array. This is expected - after all we are talking about the same string. However the retained size of this character array is counted for both arrays in this new object. In other words the retained size seems to be double.

What you have here is a transitive reference in a dominator tree:

enter image description here

The character array should not show up in the retained size of either array. If the profiler displays it that way, then that's misleading.

This is how JProfiler shows this situation in the biggest objects view:

enter image description here

The string instance that is contained in both arrays, is shown outside the array instances, with a [transitive reference] label. If you want to explore the actual paths, you can add the array holder and the string to the graph and find all paths between them:

enter image description here

Disclaimer: My company develops JProfiler.

Ingo Kegel
  • 46,523
  • 10
  • 71
  • 102
  • I will download the evaluation of jprofiler to see if it makes more sense. Thanks for your answer though. It looks to make more sense... – slbruce Dec 08 '11 at 11:50
  • Unfortunately I found jprofiler very difficult to use. I don't have time to learn how to use it to its full potential so I will just take your word for it :) Thanks for your help – slbruce Dec 11 '11 at 06:59
  • As a token of your appreciation, you could accept my answer :-) And let me assure you that JProfiler is not difficult to use at all. For the example above, you just take a heap snapshot, select the class that holds the arrays and activate the "biggest objects" view. – Ingo Kegel Dec 11 '11 at 07:50
  • I accepted your answer because it did go a long way in helping me understand what was going on. I am still a bit sceptical of jprofiler though ;) – slbruce Jan 08 '12 at 10:41
3

I'd say it is just the way the profiler displays the information. It has no idea that the two arrays should be considered for "deduplication". How about you wrap the two arrays into some kind of dummy holder object, and run your profiler against that? Then, it should be able to take care of the "double-counting".

Thilo
  • 257,207
  • 101
  • 511
  • 656
  • I agree... profiler is probably counting the string internal arrays two times. – RokL Dec 08 '11 at 08:53
  • I would tend to agree however this problem seems to cause full gc to occur when it might not be necessary - in other words - even java sees it this way – slbruce Dec 08 '11 at 09:36
  • So you are saying that Java is confused as to how much heap space is used and how much is free (and counts the same object twice)? That seems unlikely... – Thilo Dec 08 '11 at 09:49
  • I agree. I think java knows exactly what it is doing - it is just communicating it in a confusing way! – slbruce Dec 08 '11 at 11:48
0

Unless the strings are interned, they can be equal() but not ==. When constructing a String object from a char array, the constructor will make a copy of the char array. (This is the only way to shield the immutable String from later changes in the char array values.)

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
  • I think he was talking about the two arrays having the exact same String instances, though. – Thilo Dec 08 '11 at 08:12
  • @Thilo - I was picking up on _"In both arrays the strings refer to the same character array."_ It is hard to ensure that without interning the strings. – Ted Hopp Dec 08 '11 at 08:17
  • Actually it trivial to ensure that. `String s2 = s1.substring(0)` You are right, new String(char[]) constructor will copy the char array. new String(String) constructor however will behave differently on IBM JVMs than on Sun JVM. – RokL Dec 08 '11 at 09:13
  • I want the strings to refer to the same character array. This seems like a good thing in a lot of cases – slbruce Dec 08 '11 at 09:38
  • 1
    @slbruce - I think you can get two String objects that share the same char array as follows: `String a = new String(chars); String b = a.substring(0);`. Neither String will use `chars` as the char array, but they should share the same char array between them. – Ted Hopp Dec 08 '11 at 18:27
0

If you run with -XX:-UseTLAB

public static void main(String... args) throws Exception {
    StringBuilder text = new StringBuilder();
    text.append(new char[1024]);
    long free1 = free();
    String str = text.toString();
    long free2 = free();
    String [] array = { str.substring(0, 100), str.substring(101, 200) };
    long free3 = free();
    if (free3 == free2)
        System.err.println("You must use -XX:-UseTLAB");
    System.out.println("To create String with 1024 chars "+(free1-free2)+" bytes\nand to create an array with two sub-string was "+(free2-free3));
}

private static long free() {
    return Runtime.getRuntime().freeMemory();
}

prints

To create String with 1024 chars 2096 bytes
and to create an array with two sub-string was 88

You can see its consuming more memory that you might expect if they shared the same back end store.

If you look at the code in the String class.

public String substring(int start, int end) {
    // checks.
    return ((beginIndex == 0) && (endIndex == count)) ? this :
        new String(offset + beginIndex, endIndex - beginIndex, value);
}

String(int offset, int count, char value[]) {
    this.value = value;
    this.offset = offset;
    this.count = count;
}

You can see that substring for String doesn't take a copy of the underlying value array.


Another thing to consider is the -XX:+UseCompressedStrings which is on by default on newer versions of the JVM. This encourages the JVM to use byte[] instead of char[] where possible.

The size of the headers for the String and array object varies for 32-bit JVMs, 64-bit JVM with 32-bit references and 64-bit JVMs with 64-bit references.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • 3
    I don't know where you've found that substring implementation, but in Oracle/Sun and IBM JVMs, substring will NOT copy the array. – RokL Dec 08 '11 at 08:49
  • There is a bug in my code! The substring is from StringBuilder which must take a copy. – Peter Lawrey Dec 08 '11 at 09:19
  • Agreed. This is definitely not the behaviour that I see – slbruce Dec 08 '11 at 09:37
  • 1
    Fun fact: StringBuilder doesn't have to make a copy. In IBM JVM, StringBuilder.toString() will, in case backing array isn't wasting too much space, construct a new string using its own backing array and set shared flag on true. Only subsequent changes to StringBuilder will trigger an array copy - by checking shared flag. subString could use the same mechanic but for some reason it doesn't. – RokL Dec 08 '11 at 10:08