How GC finds GC roots and other object references

Question

According to this article How Garbage Collection Works there are four kinds of Gc roots:

Local variables are kept alive by the stack of a thread. This is not a real object virtual reference and thus is not visible. For all intents and purposes, local variables are GC roots.

Active Java threads are always considered live objects and are therefore GC roots. This is especially important for thread local
variables.

Static variables are referenced by their classes. This fact makes them de facto GC roots. Classes themselves can be garbage-collected, which would remove all referenced static variables. This is of special importance when we use application servers, OSGi containers or class loaders in general. We will discuss the related problems in the Problem Patterns section.

JNI References are Java objects that the native code has created as part of a JNI call. Objects thus created are treated specially because the JVM does not know if it is being referenced by the native code or not. Such objects represent a very special form of GC root, which we will examine in more detail in the Problem Patterns section below.

In JVM specification local variables in the frames of stack have no type and it's just somehow an array of bytes and it's the responsibility of compiler to generate type specific instruction for those local variables for instance iload, fload, aload, etc. So clearly GC can not find references to object by only looking at local variable section of the stack frames.

My questions are :

How GC finds those roots at all?
How GC can find local variables in the stack that are references to object and are not other type of variables (for instance variables that have been stored by iconst)?
Then How Gc finds fields of those objects to create an accessible tree?
Does it use instruction that are defined by JVM itself to find those objects?
And lastly what is the meaning of this sentence in the article?

This is not a real object virtual reference and thus is not visible

Stephen C · Accepted Answer · 2020-11-18T12:53:24.337

How GC finds those roots at all?

A JVM provides an internal (C / C++) API for finding the roots. The JVM knows where the Java stack frames are, were the static frames for Java each class are, where the live JNI object handles are, and so on. (It knows because it was involved in creating them, and keeps track of them.)

How GC can find local variables in the stack that are references to object and are not other type of variables.

The JVM keeps information for each method that says which cells in each stack frame are reference variables. The GC can figure out which method each stack frame corresponds to ... just like fillInStackTrace can.

(for instance constant)

That's not actually relevant. Constants (i.e. final fields) don't get special treatment.

Then how GC finds fields of those objects to create an accessible tree?

The JVM keeps information for each class to say which of the static and instance fields are reference variables. There is an field in each object's header that refers to the class.

The whole process is called "marking", and it is described in the page you were looking at.

Does it use instruction that are defined by jvm itself to find those objects?

I'm not sure what you are asking. But "probably yes". The GC is a component of the JVM, so everything is does is "defined by the JVM".

And lastly what is the meaning of this sentence in the article?

This is not a real object virtual reference and thus is not visible

It might be saying that the thread's stack is not a Java object ... which is true. But I think you would need to ask the authors of that Ebook; see the bottom of https://www.dynatrace.com/resources/ebooks/javabook/ for their names.

You added this:

In JVM specification local variables in the frames of stack have no type and it's just somehow an array of bytes and it's the responsibility of compiler to generate type specific instruction for those local variables for instance iload, fload, aload, etc. So clearly GC can not find references to object by only looking at local variable section of the stack frames.

Actually, that is not true. As @Holder reminded me, the verifier infers the types of the cells in the stackframe by simulating the effects of the bytecodes that initialize them. In addition, each method in a classfile has a StackMapTable attribute containing information that is used to assist (and speed up) the verifier's type determination.

Later on, the GC can obtain the inferred type information from the JVM.

^{(In theory the GC could also make use of the StackMapTable information to determine when local variables go out of scope ... within a method. But apparently it doesn't in HotSpot JVMs; see Does the StackMapTable affect the garbage collection behavior?)}

The description of garbage collection in that Ebook is (deliberately) brief and high level. But that is true of most descriptions that you will find. The deep details are complicated.

If your really want (and need) to understand how GC's work, my advice is:

To find out how the current Java implementations work read the OpenJDK source code.
Track down and read the Sun and Oracle research papers on Java GCs.
Get hold of a copy of a good textbook on Garbage Collection.

Thanks for the great answer, BTW by Constant I meant local variables that have been stored by instruction like iconst (not final keyword in Java). I edited my question in order to reflect that — Tashkhisi, Nov 17 '20 at 06:43
Those values are clearly primitive but as you said it is not clear from garbage collector point of view(by only looking that cell). My question (that you answered very well) was that when garbage collector kicks in it can not know whether a value stored in local variable of frame is primitive or a reference to object. Know you have made it clear that JVM keeps other information about cells in stack frame and knows which of them are references to object. — Tashkhisi, Nov 17 '20 at 07:07
That is incorrect. There is more information in the classfile. See my edit to my answer. Like I said, "the deep details are complicated" ... and if you want to fully understand it, you need to dive a lot deeper than the bytecode instruction set. — Stephen C, Nov 17 '20 at 07:14
Although I am learning a lot from this discussion, I did not add that and it was there from the beginning. So you are saying my interpretation from this paragraph of JVM spec is wrong: The Java Virtual Machine expects that nearly all type checking is done prior to run time, typically by a compiler, and does not have to be done by the Java Virtual Machine itself. Values of primitive types need not be tagged or otherwise be inspectable to determine their types at run time, or to be distinguished from values of reference types. — Tashkhisi, Nov 17 '20 at 07:31
Well that statement is a bit misleading. Because other parts of the JVM spec talk about Verification. However, I think that what it is saying is that **values** don't need to be tagged. This is referring some other language implementations (and in some cases hardware) where values had a tag bit to distinguish pointer values from primitive values. (Examples: the LISP machines, and MIT CLU compilers from the 1980's.) — Stephen C, Nov 17 '20 at 07:39
So can I conclude values don't have type but local variables in frames(which you call cell) have specific type? — Tashkhisi, Nov 17 '20 at 07:46
No ..... What you should conclude is that values don't use / need tag bits to distinguish their types. Because the JVM and the GC can work out what the type of a value ought to be from the context. Values in Java always have a definite type. — Stephen C, Nov 17 '20 at 08:17
Small correction: the type of stack values is determined by inference, using the known initial state and modeling the effect of each instruction, as [also discussed here](https://stackoverflow.com/a/60513131/2711488). The `StackMapTable` attribute only describes branch merge points, to help the verifier. Between these points, the verifier uses inference, as it used for the entire method prior to Java 6. And while the GC could benefit from this information, it is not used in practice in the widespread Hotspot JVM, as discussed in [this question](https://stackoverflow.com/q/48960056/2711488) — Holger, Nov 17 '20 at 08:50

How GC finds GC roots and other object references

1 Answers1