3

For some reason, G1 is deciding to increase the committed old generation memory (although the used memory does not increase) and decrease the Eden generation committed memory (consequently the usable space). It seems to be causing a spike in GC's young generation runs and making the application unresponsive for some time.

We also can see a spike in CPU usage and the total committed virtual memory in the machine (which gets bigger than the total physical memory). It is also possible to see a spike in disk usage and swapout/swapin.

My questions are:

  1. Is it likely that the G1 decision to decrease the Eden size and drastically increasing the old generation committed memory causing all those spikes?
  2. Why is it doing that?
  3. How to prevent it from doing that?

JVM version: Ubuntu, OpenJDK Runtime Environment, 11.0.11+9-Ubuntu-0ubuntu2.20.04

enter image description here enter image description here enter image description here enter image description here

EDIT: Seems that what is causing the memory spike is a sudden increase in the off-heap JVM direct buffer memory pool. The image below shows the values of 4 metrics: os_committed_virtual_memory (blue), node_memory_SwapFree_bytes (red), jvm_buffer_pool_used_direct (green) and jvm_buffer_pool_used_mapped (yellow). The values are in GB.

I'm still trying to find what is using this direct buffer memory and why it has such an effect on the heap memory.

enter image description here

Diego Marin Santos
  • 1,923
  • 2
  • 15
  • 29
  • I do not know the answer, but when G1 decreases Eden size, it means it can reach the `MaxGCPauseMillis` will less eden regions. So basically G1 can do an entire cycle within 200ms with less regions, so it sorts of "adopts". GC logs would actually show that, if you would have them. – Eugene Sep 19 '21 at 01:57
  • btw, committed memory is memory that is not used at all, it is not even backed by physical pages, it is some portion of virtual memory that later can be used. The slow-down that you see comes because you have many back-to-back young collections (6 on the graph). I guess. – Eugene Sep 19 '21 at 03:19
  • What puzzles me is why the G1 decided to do those things that actually made the pauses longer. Btw, I've updated my question with new information that I'veve found. – Diego Marin Santos Sep 20 '21 at 18:07
  • 1
    the spike is pretty big. When you want to allocate something off-heap (and those direct buffers are exactly that), but the allocation could not be performed because of space, the VM does a very interesting (and unusual) thing, see [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/nio/Bits.java#L143). Even if you have enough heap space, a GC cycle will be triggered. That will _at lest_ trigger a young collection and hopefully followed by a concurrent phase (depending on `ExplicitGCInvokesConcurrent`) flag. That seems like a plausible explanation – Eugene Sep 24 '21 at 03:07

1 Answers1

1

The issue was caused by a memory leak related to direct memory usage. An output stream was not being closed after being used.

Diego Marin Santos
  • 1,923
  • 2
  • 15
  • 29