OOM Exception with Nashorn after G1GC cleared memory

Question

I have a web application, that runs in the following environment / GC settings:

openjdk version "11.0.6" 2020-01-14 LTS
Tomcat 9
-Xms2048M -Xmx6144M -XX:+UseG1GC -XX:ReservedCodeCacheSize=64M -XX:+UseStringDeduplication -XX:SoftRefLRUPolicyMSPerMB=10

The application makes heavy use of Nashorn, which I know creates a lot of memory pressure because it creates a lot of SoftReferences to optimize the PropertyMap history. Currently my application gets an OOM Exception (Java Heap Size exhausted) a few times a week and I can't figure out why. According to GC Logs and Heap Dump everything seems fine.

Heap dump from the OOM shows lots of soft/weak References, but if I trigger a GC on the Dump in JProfiler or Eclipse Memory Analyzer both reduce the dump from almost 6 GB to 500 MB. Also no threads currently allocate big amounts of memory.

GC Logs show a similar picture. Heap grows over the entire day mainly in SoftReferences and once it hits the 6GB cap he does a full GC, the full GC clears almost everything but I still get an OOM 1-2 Minutes later.

My guess is that the cleanup of the entire heap takes longer (almost 2 seconds according to GC Log), and the OOM appears in the meantime. But, I don't understand how that's possible. Shouldn't the Full GC "stop the world"? I tried XX:SoftRefLRUPolicyMSPerMB=10 to make the SoftReferences time out sooner, but it didn't really help. Why are the references accumulating over the entire day?

I uploaded the GC logs here if somebody wants an overview: GCEasy Analysis

Here is a GC Log of the Full GC and whats also strange: the 1175 run that got cancelled after 1 minute:

[38014.682s][info   ][gc,task       ] GC(1179) Using 4 workers of 4 for full compaction
[38014.682s][info   ][gc,start      ] GC(1179) Pause Full (G1 Humongous Allocation)
[38014.709s][info   ][gc,phases,start] GC(1179) Phase 1: Mark live objects
[38015.462s][info   ][gc,stringtable ] GC(1179) Cleaned string and symbol table, strings: 269648 processed, 116742 removed, symbols: 768892 processed, 127 removed
[38015.462s][info   ][gc,phases      ] GC(1179) Phase 1: Mark live objects 753.351ms
[38015.462s][info   ][gc,phases,start] GC(1179) Phase 2: Prepare for compaction
[38015.543s][info   ][gc,phases      ] GC(1179) Phase 2: Prepare for compaction 81.292ms
[38015.543s][info   ][gc,phases,start] GC(1179) Phase 3: Adjust pointers
[38015.725s][info   ][gc,phases      ] GC(1179) Phase 3: Adjust pointers 182.121ms
[38015.726s][info   ][gc,phases,start] GC(1179) Phase 4: Compact heap
[38015.828s][info   ][gc,phases      ] GC(1179) Phase 4: Compact heap 102.293ms
[38016.550s][info   ][gc,heap        ] GC(1179) Eden regions: 0->0(389)
[38016.550s][info   ][gc,heap        ] GC(1179) Survivor regions: 5->0(20)
[38016.550s][info   ][gc,heap        ] GC(1179) Old regions: 2949->255
[38016.550s][info   ][gc,heap        ] GC(1179) Humongous regions: 106->40
[38016.550s][info   ][gc,metaspace   ] GC(1179) Metaspace: 947300K->289843K(2306048K)
[38016.550s][info   ][gc             ] GC(1179) Pause Full (G1 Humongous Allocation) 6119M->579M(2048M) 1868.032ms
[38016.550s][info   ][gc,cpu         ] GC(1179) User=3.97s Sys=0.05s Real=1.86s
[38016.550s][info   ][gc,marking     ] GC(1175) Concurrent Mark From Roots 61386.527ms
[38016.550s][info   ][gc,marking     ] GC(1175) Concurrent Mark Abort
[38016.550s][info   ][gc,stringdedup ] Concurrent String Deduplication (38016.550s)
[38016.550s][info   ][gc             ] GC(1175) Concurrent Cycle 61403.148ms

I hope somebody can point me in the right direction what the problem is and what I can do about it.

Kind Regards.

_...but I still get an OOM 1-2 Minutes later_ - this looks like an indication of something else. can you post logs _exactly_ when that happens? — Eugene, Apr 29 '20 at 22:06
I think I was wrong with that statement. It's just the application keeps running until it notices it actually crashed. I think I nailed it down to mostly default G1 not being suitable for my workload. Region size is too small, so too much counts as humongous. Also the Pause times are too tight, so GC never comes around to actually clear much of the old space. The OOM occurs randomly, but I guess it happens if a humongous allocation happens during the full gc. — Lukas Stampf, May 05 '20 at 08:19
I am not very sure I follow you, when a humongous allocation happens, G1 will start a _concurrent_ cycle - if that ends up being a Full GC or mixed GC or a young GC only - it depends on a few factors. It might be that when a _concurrent_ phase is happening, your application allocates lots of humongous Objects, thus taking away all the "free" space that is needed when that concurrent phase is happening, but I am only guessing here. If that is the case then you need more memory, to have more breathing space for the GC. — Eugene, May 05 '20 at 13:35

OOM Exception with Nashorn after G1GC cleared memory

0 Answers0