6

I have read a lot of conflicting articles regarding memory allocation when String is created. Some articles say that new operator creates a String in heap and String literal is created in String Pool [Heap] while some say that new operator creates an object in heap and another object in String pool.

In order to analyse this i wrote the below program which prints the hashcode of String char array and String object:

import java.lang.reflect.Field;

public class StringAnalysis {

    private int showInternalCharArrayHashCode(String s)
            throws SecurityException, NoSuchFieldException,
            IllegalArgumentException, IllegalAccessException {
        final Field value = String.class.getDeclaredField("value");
        value.setAccessible(true);
        return value.get(s).hashCode();
    }

    public void printStringAnalysis(String s) throws SecurityException,
            IllegalArgumentException, NoSuchFieldException,
            IllegalAccessException {
        System.out.println(showInternalCharArrayHashCode(s));

        System.out.println(System.identityHashCode(s));

    }

    public static void main(String args[]) throws SecurityException,
            IllegalArgumentException, NoSuchFieldException,
            IllegalAccessException, InterruptedException {
        StringAnalysis sa = new StringAnalysis();
        String s1 = new String("myTestString");
        String s2 = new String("myTestString");
        String s3 = s1.intern();
        String s4 = "myTestString";

        System.out.println("Analyse s1");
        sa.printStringAnalysis(s1);

        System.out.println("Analyse s2");
        sa.printStringAnalysis(s2);

        System.out.println("Analyse s3");
        sa.printStringAnalysis(s3);

        System.out.println("Analyse s4");
        sa.printStringAnalysis(s4);

    }

}

This program prints following output:

Analyse s1
1569228633
778966024
Analyse s2
1569228633
1021653256
Analyse s3
1569228633
1794515827
Analyse s4
1569228633
1794515827

From this output one thing is very clear that irrespective of how String is created, if Strings have same value then they share same char array.

Now my question is where is this chararray stored , is it stored in heap or it goes to permgen? Also i want to understand how to diferentiate between heap memory addresses and permgen memory addresses.

I have a big issue if it is stored in permgen as it will eat up my precious limited permgen space. and if char array is not stored in permgen but in heap then does it imply that String literals also use heap space [which is something i have never read] .

Lokesh
  • 7,810
  • 6
  • 48
  • 78
  • The Java compiler is simply too clever. Try `"...".toCharArray()` or such. But then the information level sinks to zero. – Joop Eggen Apr 22 '13 at 16:45
  • [this](http://www.precisejava.com/javaperf/j2se/StringAndStringBuffer.htm) may be useful – Anirudha Apr 22 '13 at 16:46
  • It would be more convincing if you built a `String` from a `StringBuilder`, perhaps by calling a separate routine to append parts of the string value. – Ted Hopp Apr 22 '13 at 16:47
  • @Anirudh: I read that link but it doesn't talk about internal char arrays of Strings. – Lokesh Apr 22 '13 at 16:52
  • 1
    @JoopEggen: Didn't get your point. Can you please elaborate a little. – Lokesh Apr 22 '13 at 16:54
  • The String internal char array being the same for all, is possible as the string literal is identical for all. Nice that no internal allocation from char array happens. `"myTestString"` seems to be 1794515827, where `intern` seems clever too, not creating a new String constant. – Joop Eggen Apr 22 '13 at 17:16

3 Answers3

3

From String src

 public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

it's clear that the string created with this constructor shares the char array (value) with the original string.

It's important to note that the API does not guarantee this sharing:

Initializes a newly created String object so that it represents the same sequence of characters as the argument; in other words, the newly created string is a copy of the argument string. Unless an explicit copy of original is needed, use of this constructor is unnecessary since Strings are immutable

For example, String.substring used to share char array with the original string, but in latest versions of Java 1.7 String.substring makes a copy of char array.

Evgeniy Dorofeev
  • 133,369
  • 30
  • 199
  • 275
  • FYI, your source example is from an earlier version of Java (1.5, I'm guessing). It led to a *lot* of unexpected memory exceptions, which is why the current (1.6/1.7) versions look at the size of the backing array versus the reported size of the string. – parsifal Apr 22 '13 at 17:22
  • @Evgeniy: That explains why char array is same for all Strings created using new and it shares it with String literal. Is there any way to test where this char array is created in heap or permgen? – Lokesh Apr 22 '13 at 17:27
  • 1
    if str == str.intern() it means str was in permgen – Evgeniy Dorofeev Apr 22 '13 at 17:36
  • @Evgeniy:str == str.intern() will tell me about String object reference but not about internal char array. – Lokesh Apr 23 '13 at 15:33
  • I agree, but 1) at least if String is in permgen then its char array is guaranteed to be there too 2) there is hardly any other way – Evgeniy Dorofeev Apr 23 '13 at 15:43
  • I also feel it shld be that way but my problem is articles like this : http://theopentutorials.com/tutorials/java/strings/string-literal-pool/ , it states otherwise and i can't find any standard article stating otherwise. – Lokesh Apr 23 '13 at 16:49
  • @EvgeniyDorofeev - "if String is in permgen then its char array is guaranteed to be there too" do you have a reference for this? – parsifal Apr 23 '13 at 17:40
  • Right, "guaranteed" was too strong without a reference to JLS, but if you take loki's showInternalCharArrayHashCode and print value's hashcode before and after interning you will see that hashCode changed, which in my opinion means that char[] location changed. I can imagine the only reason - it was moved to permgen – Evgeniy Dorofeev Apr 24 '13 at 02:36
  • @EvgeniyDorofeev: Char array hashcode has not changed even after interning in the output, which means String in permgen and String in Heap share same char array!! – Lokesh Apr 24 '13 at 12:19
2

From this output one thing is very clear that irrespective of how String is created, if Strings have same value then they share same char array

Not quite: this is happening because you start with one literal string, and create multiple instances from it. In the OpenJDK (Sun/Oracle) implementation, the backing array will be copied if it represents the entire string. You can see this in src.jar, or here: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/String.java#String.%3Cinit%3E%28java.lang.String%29

If you carefully construct your source strings such that they start from different character arrays, you'll find that they don't share the backing array.

Now my question is where is this chararray stored

To the best of my knowledge, the character array for a string literal is stored on the heap (those with better knowledge of classloading internals, feel free to comment). Strings loaded from files will always store their backing arrays on the heap.

What I do know for sure is that the data structure used by intern() only references the String object, not its character array.

parsifal
  • 206
  • 1
  • 3
  • I have checked in JLS, String objects whether literals or new Strings are indeed stored in heap. String pool is just a set of references. So char array indeed goes to heap. – Lokesh May 21 '13 at 03:55
1

Last first: By definition, the literal "myTestString" is interned, and all interned String references with the same value refer to the same physical String object. So the literal will be the EXACT SAME STRING as the result from intern.

[Corrected] By definition, the hashCode (but not the identityHashCode) of two Strings with identical character sequence values will be identical.

The hashCode of a char[] array, on the other hand, is simply a jumble of its address bits and bears no relation to the contents of the array. This indicates that the value array is, in all above cases, the exact same array.

(Further info: The old implementation of String included a pointer to a char[], an offset, a length, and a hashCode value. Newer implementations deprecate the offset value, with the String value beginning with element 0 of the array. Other (non-Sun/non-Oracle) implementations do away with the separate char[] array and include the String bytes inside the main heap allocation. There is no requirement that the value field actually exist.)

[Continued] Copied over the test case and added a few lines. hashCode and identityHashCode produce the same values on a given char[], but produce different values on different arrays with the same contents.

The fact that the arrays are identical in s1 and s2 is almost certainly because they are sharing the char[] array of the interned literal "myTestString". If the Strings were separately constructed from "fresh" char[] arrays they would be different.

The main take-away from all this is that String literals are interned, and the implementation being tested "borrows" the array of the source when a String is copied with new String(String).

Char array hash codes
a1.hashCode() = 675303090
a2.hashCode() = 367959235
a1 identityHashCode = 675303090
a2 identityHashCode = 367959235
Strings from char arrays
a1 String = ABCDE
a1 String's hash = 62061635
a1 String value's identityHashCode = 510044439
a2 String = ABCDE
a2 String's hash = 62061635
a2 String value's identityHashCode = 1709651096
Hot Licks
  • 47,103
  • 17
  • 93
  • 151
  • "By definition, String.hashCode and System.identityHashCode on a String return the same value" -- do you have a reference for this? Because it certainly isn't what the [docs](http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#identityHashCode(java.lang.Object)) say. – parsifal Apr 22 '13 at 17:18
  • @parsifal - OK, you got me there -- misread the spec slightly. identityHashCode presumably returns the "jumbled up address" version of the hash and would therefore identify different (but "identical") objects. – Hot Licks Apr 22 '13 at 17:23
  • @HotLicks: If you see the output the hashcode of char arrays is same for all Strings. So this is not correct "So it's no surprise that the first two array hashes are different" – Lokesh Apr 22 '13 at 17:24