Java 17 process running out of memory on Kubernetes when memory potentially available

Question

The goal is to understand what should be tuned in order for the Java process to stop restarting itself.

We have a Java Springboot backend application with Hazelcast running that restarts instead of garbage collecting.
Environment is:

Amazon Corretto 17.0.3

The only memory tuning parameter supplied is:

-XX:+UseContainerSupport -XX:MaxRAMPercentage=80.0

The memory limit in kubernetes is 2Gi so the container gets 1.6Gi

Graphs of memory usage:

The huge drop towards the end is where I performed a heap dump. Performing the dump lead to a drastic decrease in memory usage (due to a full GC?).

The GC appears to be working against me here. If the memory dump was not performed, the container hits what appears to be a memory limit, it is restarted by kubernetes, and it continues in this cycle. Are there tuning parameters that are missed, is this a clear memory leak (perhaps due to hazelcast metrics) https://github.com/hazelcast/hazelcast/issues/16672)?

While there's not a lot of long-term data, a jagged yet increasing usage of memory over time is usually a good indicator of more and more objects being unable to be reclaimed (i.e. a memory leak). If that's the case, in the short term _scheduling_ a restart at a reasonable time (e.g. 4am) may reduce the impact of this, since you can only grow memory for so long. — Rogue, Jun 30 '22 at 13:37
What is the exact exception that you're seeing? `OutOfMemoryError` could be due to a number of reasons, e.g. `Java heap space` (where there really is not enough heap available), `GC overhead limit` (i.e. GC has been running but was unable to reclaim past a certain threshold in a given timeframe), and so on. The specific cause and fix depend on which error you're getting. — filpa, Jan 10 '23 at 16:30
From the charts above, it looks like the heap keeps increasing, so there may be a leak somewhere. The usual way to analyze is to take a memory dump and investigate it with sth like Eclipse MAT plugin. Another useful simpler tool that may give a hint is `jmap -histo` which lists the objects using the most memory. — murtiko, Jan 10 '23 at 17:04

score 0 · Answer 1 · answered Jan 10 '23 at 16:22

So the JVM will determine which garbage collector (GC) to use based on the amount of memory and CPU given to the application. By default, it will use the Serial GC if the RAM is under 2GB or the CPU cores is less than 2. For a Kubernetes server application, the Serial GC is not a great choice as it runs in a single thread and it seems to wait until the heap is near the max limit to reclaim the heap space. It also results in a lot of pausing of the application which can lead to health check failures or scaling to due to momentary higher cpu usage. What has worked best for us, is to force the use of the G1 GC collector. It is a concurrent collector that runs side by side with your app and tries its best to minimize application pausing. I would suggest setting your CPU limit to at least 1 and setting your RAM limit to however much you think your application is going to use plus a little overhead. To force the G1 GC collector add the following option to your java XX:+UseG1GC.

Java 17 process running out of memory on Kubernetes when memory potentially available

1 Answers1