2

I have some questions revolving around the garbage collection of string objects and literals and the string pool.

Setup

Looking at a code snippet, such as:

// (I am using this constructor on purpose)
String text = new String("hello");

we create two string objects:

  • "hello" creates one and puts it into the string pool
  • new String(...) creates another, stored on the heap

Garbage collection

Now, if text falls out of scope and nobody references them anymore, it can be garbage collected, right?

But what about the literal in the pool? If it is not referenced by anyone anymore, can it be garbage collected as well? If not, why?

H3AR7B3A7
  • 4,366
  • 2
  • 14
  • 37
  • 1
    The literal "hello" is not created on the heap. It will be compiled into a readonly memory section. – h0r53 Oct 21 '21 at 16:14
  • I think the literal `"hello"` is not eligible for garbage collection. It is physically part of the class `New`. But I'm not sure. If you have a very large number of string literals, I'd suggest putting them in a file (use a resource file) and reading them at runtime. This way they are definitely eligible for garbage collection. – markspace Oct 21 '21 at 16:15
  • 1
    The literal `"hello"` is not eligible for garbage collection. The reason is, no dynamic memory was allocated for this literal. Yes, it is referenced when the `String` constructor copies it into the heap, but the literal itself does not live, and never lived, in the heap. It would be quite dangerous to try to Garbage Collect literals, because doing so would simply erase the memory that the literal originated from, but doing so would either resize the file structure or simple "zero-out" the data. The latter has no benefit, and the former would require adjusting offsets in the class file format. – h0r53 Oct 21 '21 at 16:19
  • Look in your compiled class file. You'll find the string literals there. Technically, as all Java code runs in the JRE, all class files are loaded into a form of dynamically managed memory, the permgen/MetaSpace. So from that you could suggest all Java code runs in the heap of the Java runtime. But this is different than a traditional heap that is managed by a Garbage collector. Newer versions of Java allow for unloading classes. This is in ways comparable to garbage collection, but at a class level. This again is different than the traditional heap. There is a useful distinction. – h0r53 Oct 21 '21 at 20:56

2 Answers2

1

When we create a String via the new operator, the Java compiler will create a new object and store it in the heap space reserved for the JVM.

To be more specific, it will NOT be in the String Pool, which is a specialized part of the (heap) memory.

String text = new String("hello");

As soon as there is no more reference to the object it is eligible for GC.

In contrast, the following would be stored in the string pool:

String a = "hello";

When we call a similar line again:

String b = "hello";

The same object will be used from the String Pool, and it will never be eligible for GC.

As to why:

To reduce the memory needed to hold all the String literals (and the interned Strings), since these literals have a good chance of being used many times over.

H3AR7B3A7
  • 4,366
  • 2
  • 14
  • 37
  • Using `new String("hello")` doesn't save any memory in any case, because you're still passing in a literal `"hello"`, and that goes in the string pool. – user2357112 Oct 21 '21 at 17:19
  • Could you share a source? I thought that was what **intern()** was for... And it will only be added to the pool when intern is called on the object. – H3AR7B3A7 Oct 21 '21 at 17:20
  • *Your answer* is one source. Your answer says literals go in the string pool. The `"hello"` in `new String("hello")` is a literal, which goes in the string pool. – user2357112 Oct 21 '21 at 17:24
  • The assignment to a literal is a special case in Java, and just passing "hello" as a param is not the same thing. I can see how you would think that though. – H3AR7B3A7 Oct 21 '21 at 17:26
  • For another source, see [the Javadoc for `String::intern`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#intern()): "All literal strings and string-valued constant expressions are interned." – user2357112 Oct 21 '21 at 17:27
  • For more sources, see pretty much any Google hit for `Java string pool`. The assignment is not special. The literal is what matters. – user2357112 Oct 21 '21 at 17:28
  • If you wouldn't have to call text.intern() for it to be interned, why does it exist? And why does no diagram exist that says the new operator creates the object in the heap AND the literal in the pool? – H3AR7B3A7 Oct 21 '21 at 17:30
  • `intern` exists to intern strings created dynamically, like with string concatenation or `StringBuilder`. – user2357112 Oct 21 '21 at 17:31
  • 2
    `new` does not create the literal. `new` *receives* the literal, and creates a copy of it. (For strings, this is almost always a pointless operation. `String` probably shouldn't even *have* a copy constructor, but it was designed in Java 1.0, where they made a lot of design mistakes.) – user2357112 Oct 21 '21 at 17:34
  • "As soon as there is no more reference to the object it is eligible for GC, and no string "hello" will be anywhere in memory." - this is not true. The literal will remain in memory, just not on the heap. – h0r53 Oct 21 '21 at 17:36
  • 1
    The compiler translates Java source code to Java bytecode. It does not “*create a new object and store it in the heap space*”; that happens at runtime, not compile-time. – Holger Oct 25 '21 at 08:33
  • 1
    @h0r53 there’s a tenacious myth about objects for string literals not being on the heap. **All objects** are on the heap. That’s simply [a matter of definition](https://docs.oracle.com/javase/specs/jvms/se17/html/jvms-2.html#jvms-2.5.3): “*The heap is the run-time data area from which memory for all class instances and arrays is allocated.*”. The string pool only contains **references** to certain string objects Being referenced by the pool doesn’t change the nature of the memory containing the object. To complicate matters, a string typically consists of two objects, the `String` and an array – Holger Oct 25 '21 at 08:44
1

The specification does not mandate a behavior. All it requires, is that all string literals (and string-typed compile-time constants in general) expressing the same string, evaluate to the same object at runtime.

JLS §3.10.5:

At run time, a string literal is a reference to an instance of class String (§4.3.3) that denotes the string represented by the string literal.

Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.29) - are "interned" so as to share unique instances, as if by execution of the method String.intern (§12.5).

Its also repeated in JLS §15.29:

Constant expressions of type String are always "interned" so as to share unique instances, using the method String.intern

This implies that each Java implementation maintains a pool at runtime which can be used to look up the canonical instance of the string. But the pool doesn’t have to hinder garbage collection. If no other reference to the object exists, the string instance could be garbage collected, as the fact that a new string instance has to be constructed when necessary, is unobservable.

Note that when you add strings to the pool manually, by invoking intern(), the string instances may indeed get garbage collected when otherwise being unreachable.

But in practice, the common implementations, like the HotSpot JVM associate a reference from the code location to the string instance after the first execution, so the object is referenced by the code containing the string literal or compile-time constant. So, the object associated with the string literal can only get garbage collected, when the class itself gets garbage collected. This is only possible when its defining class loader and in turn, all other classes defined by this loader are unreachable too.

For the application class loader, this is impossible. Class unloading can only happen for additional class loader created at runtime. Then, the string instances created for compile-time constants within classes loaded by this class loader may get garbage collected, if not matching constants in other code.

Holger
  • 285,553
  • 42
  • 434
  • 765