0

So whilst debugging some peaks in memory usage on our prod environment, I started periodically dumping threaddumps, and have found the following part:

"C2 CompilerThread0" #7 daemon prio=9 os_prio=0 cpu=208687.03ms elapsed=10057.72s tid=0x00007efed95b29d0 nid=0x2a runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   Compiling: 41155       4       my.class::myFunction (384 bytes)

This is present in a lot of follow up dumps (spanning up to 50 seconds) whilst, according to VM.native_memory summary, taking up memory in the range of 1G - 1.3G (various observed cases).

So, the threaddump before it starts the compilation is

"C2 CompilerThread0" #7 daemon prio=9 os_prio=0 cpu=185150.71ms elapsed=10028.85s tid=0x00007efed95b29d0 nid=0x2a waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   No compile task

The last one before finishing

"C2 CompilerThread0" #7 daemon prio=9 os_prio=0 cpu=247100.38ms elapsed=10096.32s tid=0x00007efed95b29d0 nid=0x2a runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   Compiling: 41155       4       my.class::myFunction (384 bytes)

and the one after finishing

"C2 CompilerThread0" #7 daemon prio=9 os_prio=0 cpu=255511.25ms elapsed=10105.79s tid=0x00007efed95b29d0 nid=0x2a waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   No compile task

So before compilation starts, between threaddumps (roughly all 10 seconds), the cpu time increases in the range of 0.xx ms, which is negligible.

Yet, in the time of the compilation, it rose by 70 seconds, which seems insane. The ID 41155, which apparently is given for any compilation task and should therefor not repeat, is also the same across the relevant threaddumps. What I have yet to figure out is what the (384 bytes) mean at the end - those remain the same.

This function is per se not special - it is called plenty of times across several pods and seems to be absolutely fine 99% of the cases - these are super rare occasions, but we can't seem to find a root cause.

What might be relevant information here is that the VM.native_memory dump shows the memory wandering from Compilation to Arena Chunks, where it is then quickly taken care of and memory usage drops back to normal levels.

I know there was a memory leak on Java 17.0.4, but we are running OpenJDK 64-Bit Server VM Temurin-17.0.7+7 (build 17.0.7+7, mixed mode, sharing) so this shouldn't be the issue at hand.

The question here is, where would one go from here? I've added several logs to the application at this function, but can't seem to find any "misbehaviour".

Another question would of course be: How does the code behave? As in, if it's compiling, would we even get any logs until it's finished? Idea being that it would compile -> execute and therefor the logs would only "appear" after it.


After adding the flag XX:CompileCommand=exclude,myClass::myFunction, the output of the threaddump is now a bit different:


"C2 CompilerThread0" #7 daemon prio=9 os_prio=0 cpu=244772.46ms elapsed=15463.68s tid=0x00007fe65e0299d0 nid=0x2c runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   Compiling: 42704   !   4       myClass::deliverMessage (318 bytes)

This helps me narrow it down a bit, but the question is, why is there a ! in the output?


One level deeper we go! The peak happened again, and now with deliverMessage blocked from JIT, we reach the next level:

"C2 CompilerThread0" #7 daemon prio=9 os_prio=0 cpu=209412.09ms elapsed=5769.99s tid=0x00007f855b8b69d0 nid=0x2c runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   Compiling: 41589       4       my.kafka.Payload::toString (288 bytes)

So this seems to be the "root" that is triggering JIT to eat up memory and take long compilation times. This toString is rather big, with roughly 70 variables printed out here. In addition, it's all done with the following schema:

return "KafkaPayload{" +
    "id=" + id +
    ", externalId='" + externalId + '\'' +
    ...
    ", eSignatureBase64Image='" + (shouldContainESignatureBase64Image ? eSignatureBase64Image : "censored") + '\'' +
    ...
} 

Interesting here is that the big boy, namely the eSignatureBase64Image is censored out, as it gets relatively large. Could it be that even tho it's not printed out, the JIT still has an internal conflict and somehow tries to optimize it?

Ricardo
  • 97
  • 9
  • The (384 bytes) refers to the bytecode size of the method being compiled. Please specify what Java version you're running. – boneill May 26 '23 at 16:04
  • Did you search for this issue? There is information out there. – aled May 26 '23 at 17:42
  • Try to disable compilation of the problematic method: `-XX:CompileCommand=exclude,my.class::myFunction` – apangin May 26 '23 at 23:02
  • @boneill We are using `OpenJDK 64-Bit Server VM Temurin-17.0.7+7 (build 17.0.7+7, mixed mode, sharing)` @aled I've searched far and wide, and had to read up on a lot of low level stuff I wouldn't have thought I'd ever have to read up on, but I have yet to find anything concrete on this issue. – Ricardo May 30 '23 at 06:42
  • @apangin good point, added this as flag now to a separate pod to see how it behaves – Ricardo May 30 '23 at 06:43

0 Answers0