2

The JVM can easily update the references of local variables, static references, class instances or object array instances when moving an object in heap. But how can it update the references pushed to the operand stack?

neoexpert
  • 465
  • 1
  • 10
  • 20

1 Answers1

8

There is no fundamental difference between a local variable and an entry in the operand stack. Both live in the same stack frame. Neither is formally declared and both need the JVM to perform inference to recognize their actual use.

The following code

public static void example() {
    {
        int foo = 42;
    }
    {
        Object bar = "text";
    }
    {
        long x = 100L;
    }
    {
        Object foo, bar = new Object();
    }
}

will (typically) get compiled to

  public static void example();
    Code:
       0: bipush        42
       2: istore_0
       3: ldc           #1                  // String text
       5: astore_0
       6: ldc2_w        #2                  // long 100l
       9: lstore_0
      10: new           #4                  // class java/lang/Object
      13: dup
      14: invokespecial #5                  // Method java/lang/Object."<init>":()V
      17: astore_1
      18: return

Note how the local variable at index 0 in the stack frame gets reassigned with values of different types. As a bonus, the last store to variable index 1 invalidates the variable at index 0 as it otherwise would contain a dangling half of a long value.

There are no additional hints about the type of local variables, debugging information is optional and stack map tables are only there when the code contains branches.

The only way to determine whether a local variable contains a reference, is to follow the program flow and retrace the effect of the instructions. This does already imply inferring the values on the operand stack, as without it, we wouldn’t even know what the store instruction put into the variable.

The verifier does it, it’s even mandatory, and the garbage collector or whatever supporting code of the JVM can do it too. An implementation may even have a single analyzing code keeping the type information of the first analysis, which would be the verification.

But even when this information is reconstructed every time the garbage collector needs it, the overhead would not be astronomical. The garbage collector runs only periodically and it only needs this information for the currently executed methods. And that’s all about interpreted execution only.

When the JIT compiler generates code, it needs to utilize the type information anyway and can prepare information for the garbage collector, but it will do so only for certain points called safepoints where the generated code checks whether there’s an outstanding garbage collection. This implies that in-between these points, the data doesn’t need to be in a form the garbage collector understands and the optimized code may assume that the garbage collector won’t relocate objects while it is processing them.

It also implies that in compiled, optimized code the reachability might be entirely different than in simple interpreted execution, i.e. unused variables might be absent, but even objects in use from a source code point of view may be considered unused when the optimized code works with copies of their fields, e.g. in CPU registers.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • Are the types of local variable table and operand stack at a specific execution point (program counter) always the same (independent of previous execution-branches)? Maybe the JVM could build a type map table "ahead of time" for every method for every possible program counter value. – neoexpert Mar 03 '20 at 18:51
  • 3
    Yes, in fact that's exactly what it does during bytecode verification, which is performed when loading a class. – Antimony Mar 03 '20 at 19:25
  • 2
    That’s what I meant with “*An implementation may even have a single analyzing code keeping the type information of the first analysis*” which would typically be the information gathered during verification. However, you have to keep in mind that holding it for every instruction is a lot of data while the likelihood of being the current instruction during a stop-the-world phase is very low for an individual instruction. It would make more sense to only keep them for branch merge points (which stack map tables provide), method invocations, and allocation instructions and infer others on-the-fly. – Holger Mar 04 '20 at 07:43