1

Recently I'm learning Hotspot JVM. When learning the string constant pool and String intern function, I encountered a very weird situation. After browsing a lot of answers, I still can’t explain this phenomenon, so I’m sending it out to discuss with you.

    public static void main(String[] args) {
        String s1 = new String("12") + new String("21");
        s1.intern();
        String s2 = "1221";
        System.out.println(s1 == s2); // true
    }

    public static void main(String[] args) {
        String s1 = new String("12") + new String("21");
        // s1.intern();
        String s2 = "1221";
        System.out.println(s1 == s2); // false
    }

The reslut is based on Java8.

So the only difference between the two codes is call s1.intern() or not.

Here is the document of intern function.

When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.

Here is my understanding:

  1. By browsing the bytecode file, we can find "12", "21", "1221" in the constant pool.
  2. When the class is loaded, the constant pool in bytecode file is loaded into run-time constant pool. So the String pool contains "12", "21", "1221".
  3. new String("12") create a String instance on the heap, which is different from "12" in String pool. So does new String("21").
  4. The "+" operator is transformed into StringBuilder and call its append and toString method, which can be seen in bytecode.
  5. In toString method calls new string, so s1 is String instance "1221" on the heap.
  6. s1.intern() look into String pool, and a "1221" is there, so it dose nothing. Btw, we don't use the return value, so it has nothing to do with s1.
  7. String s2 = "1221" just loaded the "1221" instance in the string pool. In bytecode, ldc #11, #11 is the index of "1221" in constant pool.
  8. The "==" operator comapre the address of reference type. The s1 point to the instance on the heap, the s2 point to the instance in the string pool. How can these two be equal?

My wonder:

  1. What exactly do s1 and s2 point to?
  2. Why call intern() methed will change the behavior? Even don't use the return value.

Here is my assumption:

  1. The string pool is not initilized when class is loaded. Some answer said s1.intern() is the first time "1221" is loaded into string pool. But how to explain "1221" is in the constant pool of bytecode file. Is there any specification about string pool loading timing?

  2. Another saying is intern function just save the reference to the instance on the heap, but the renference s1, s2 are still different. s1 point the heap, s2 point to the string pool, and string pool point to the heap. The reference is different from reference of a reference.

DracoYu
  • 37
  • 3
  • You're overlooking the possibility that the compiler elided your redundant `new String("12")` and `new String("21")` into `"1221"` at compile time. – user207421 Jun 29 '23 at 09:53
  • @user207421 In that case whether `intern` is called would not matter and both cases should print true. – Sweeper Jun 29 '23 at 09:54
  • I think this is just unspecified behaviour. Nothing says this cannot happen. Nothing I can find says that the string literal has to be *in* the string pool by the time `intern` is called. `intern` could have added `new String("12") + new String("21")` to the string pool instead, and `"1221"` refers to the interned string. – Sweeper Jun 29 '23 at 09:58
  • @user207421 I have checked the bytecode, and at least it call the new Instruction. 7 new #4 . And I have no reason to believe it was optimized in the JIT stage. – DracoYu Jun 29 '23 at 10:34
  • @Sweeper So is there any difference between constant pool and string pool? [constant pool](https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.4) and [Run-Time Constant Pool](https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-5.html#jvms-5.1). I have not seen string pool in JVM Specification. So I consider string pool as a part of run-time constant pool, and it was loaded at "Loading, Linking, and Initializing" stage while a class was loaded by classloader. – DracoYu Jun 29 '23 at 10:52
  • Oh wait, the answer to your question is literally in the second link: "If the method `String.intern` has previously been called on an instance of class `String` containing a sequence of Unicode code points identical to that given by the `CONSTANT_String_info` structure, then the result of string literal derivation is a `reference` to that same instance of class `String`." – Sweeper Jun 29 '23 at 10:58
  • @Sweeper That make sense! The constant pool and string pool are indeed different. The Unicode code point in bytecode constant pool was loaded into run-time constant pool when class loaded. While that corresponding String instance was not created in string pool. In constant pool is just a Unicode sequence, while in string pool is a string instance, which contains other fields in addition to the Unicode sequence. The string instance is create and added into string pool when `ldc` instruction first push a string constant from constant pool or call intern function on string instance manually. – DracoYu Jun 29 '23 at 14:40
  • @user16320675 Thank you so much to correct my misunderstanding on "String Pool" and "Constant Pool". There is too much misinformation about it on the web. Its name is indeed misleading - we usally call "string constant pool", and we do see CONSTANT_String_info in constant pool of bytecode. So in the above code, there is There are three identical "1221" sequences in memory. One is string object in heap by calling `StringBuilder.toString()`, One is string object added to string pool by calling `intern()` function, and the last is Unicode sequences in constant pool. – DracoYu Jun 29 '23 at 14:58
  • @user16320675 The last question I'm not sure about is that I can't see the code or entry of the string pool in the String class source code. Which is said "A pool of strings, initially empty, is maintained privately by the class String" in [string](https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/lang/String.html#intern()) . And there is no more description in the JVM specification. Maybe it depends on the specific jvm implementation. And maybe Hotspot treats the string class in a special way and does extra work, compared to other custom classes. – DracoYu Jun 29 '23 at 15:09

3 Answers3

2
String one = new String("abc");
String two = new String("abc");

boolean res1 = one == two;      // false -> two different objects
boolean res2 = one.equals(two); // true -> content identical

one = one.intern(); // i.e. put string (if not exist) to the StringPool
// and retrieve the object from the StringPool back 
two = two.intern();
boolean res3 = one == two;      // true -> same object from the StringPool
boolean res4 = one.equals(two); // true -> content identical

// Put string literal "12" into StringPool
// Create and object in heap with "12"
String one = new String("12");
String two = new String("21");

// Concatenate two strings
// Put result into StringPool and retrieve it back 
String two = one + two;


// Concatenate two strings
// Put result into StringPool and retrieve it back
// Create an object in heap with result string
String three = new String(one + two);


// Put string literal to the StringPool and retrieve it back
String four = "1221";

boolean res1 = two == four;  // true -> both objects are from StringPool
boolean res2 = three == four; // false -> `three` is in Heap,
// `four` is in StringPool


// Put string into StringPool and retrieve it back
three = three.intern(); 

boolean res3 = three == four; // true -> both objects are from StringPool
Oleg Cherednik
  • 17,377
  • 4
  • 21
  • 35
1

I am the questioner.

Thanks for the discussion with @Sweeper and @user16320675, I have new understanding of this problem, and I share it with you here.

The error occurred in understanding 2 and 6, the string pool was not loaded along with the class loading. s1.intern() is the first time adds "1221" to the string pool. And then String s2 = "1221" will change the behavior according to whether "1221" exists in the string pool.

In order to better explain this problem, first define the key concepts involved.

key concept

  • Constant pool: A Data structure in bytecode, used to store constants, strings, classes, fields, methods, interfaces, parameter types, etc. used in source code. Stored in a bytecode file on the hard disk.
  • Runtime constant pool: When the program is running, the constant pool in memory. When the class is loaded, the constant pool data will be loaded into the JVM method area to form a runtime constant pool.
  • CONSTANT_String_info: A data structure in the constant pool, which stores the Unicode sequence corresponding to the string literal in the source code
  • String pool: A memory area in the JDK8 heap for accessing used String instances.
  • ldc #5: push the No.5 constant from the runtime constant pool to the operand stack. When using a string represented by a literal, it will first check whether there is a corresponding string instance in the string pool. If it exists, its reference address is pushed into the stack; if it does not exist, a string instance is created in the string pool and its address is pushed into the stack.

wrong reason

The error comes from misunderstanding the relationship between the string pool and the constant pool (hereafter using the constant pool and the runtime constant pool indiscriminately).

Although it is usually called string constant pool, it does not have a relationship with the constant pool. Therefore, it will not be loaded as the class is loaded. In JDK6, both the string pool and the constant pool are located in the permanent generation, and there seems to be some relationship between them. But in JDK8, the string pool was moved to the heap. It is not so much part of the constant pool as it is part of the String class. It can be understood as a private member variable of the String class, although it cannot be observed in the String source code.

After the String instance in the string pool is created, the byte array in the instance cannot be changed. If a change operation is performed on an existing String instance, a new String instance will be generated, showing the characteristics of a constant, so it is usually called a string constant pool. But in order to avoid confusing the string pool and the constant pool, I tries to use the string pool instead of the string constant pool.

Another concept that is easily confused with it is CONSTANT_String_info in the constant pool. String literals are stored in Unicode sequences, and will be loaded into the runtime constant pool along with class loading. But it is fundamentally different from the string pool: CONSTANT_String_info only stores Unicode sequences, while the string pool stores String instances. String instances not only contain Unicode sequences, but also other member attributes, such as hash. And the String class is bound with many methods which cannot be executed on CONSTANT_String_info. The corresponding String instance can be generated by executing the String initialization function with the Unicode sequence in CONSTANT_String_info as a parameter.

DracoYu
  • 37
  • 3
1

Here is a short explanation. First operator == will be only true if two compared strings are actually the same instance of a String class. For 2 different instances of a String class that hold the same content the result would be false. So if you really want to compare the content of 2 Strings you MUST use methods equals() of a String class. Now if you write the following code:

String s1 = "test";
//s1.intern();
String s2 = "test";
System.out.println(s1==s2) // output most likely will be true;

Even if you don't invoke s1.intern() it will most likely (although not guaranteed) will be invoked behind the scenes by JVM and s2 will be assigned the same instance, and that is why the s1==s2 will be true. (If you invoke s1.intern() than the true result is guaranteed). Now if you run the following code:

String s1 = "test";
s1.intern();
String s2 = new String("test");
System.out.println(s1==s2) // output will be false;

Because with new String("test") you forse creation of a new instance of a String regardless of what is already in existence in the internal pool

Michael Gantman
  • 7,315
  • 2
  • 19
  • 36