How to interpret Java G1 GC pause time cause

Question

We have been struggling with what appears to be long stop the world pauses when using the G1 collector. I have read through the Oracle documentation but I am still having trouble determining how to interpret what is causing the long pauses and what do to about it. (GC Logs below)

Our instance is being monitored and I have information contained in the following graphs:

We have another monitoring tool that pings the JVM and I had it report the JVM has unresponsive for 12 seconds around the exact same time.

So that brings me to the question of what to do about this. The load on the server is very low, so this doesn't happen often, but it seems that over several hours the heap just grows and grows and then there is a huge GC event that can cause serious issues. Below is the configration we are using for the GC:

   wrapper.java.additional.40=-XX:+UseG1GC

   wrapper.java.additional.44=-XX:+ScavengeBeforeFullGC


   wrapper.java.additional.50=-XX:+PrintGCCause
   wrapper.java.additional.51=-XX:+PrintGCDetails
   wrapper.java.additional.52=-XX:+PrintGCTimeStamps
   wrapper.java.additional.53=-XX:+PrintGCApplicationStoppedTime
   wrapper.java.additional.54=-XX:+PrintGCApplicationConcurrentTime
   wrapper.java.additional.55=-verbose:gc
   wrapper.java.additional.56=-Xloggc:../../../logs/gc.log
   wrapper.java.additional.57-XX:+UseGCLogFileRotation
   wrapper.java.additional.58-XX:NumberOfGCLogFiles=10
   wrapper.java.additional.59-XX:GCLogFileSize=100M
   wrapper.java.additional.60=-XX:+PrintHeapAtGC
   wrapper.java.additional.61=-XX:+PrintTenuringDistribution
   wrapper.java.additional.62=-XX:+UseCompressedClassPointers
   wrapper.java.additional.63=-XX:+UseCompressedOops

Can anyone point me in the right direction here. Thanks!

GCEasy Analysis: http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDYvMjcvLS1nYyAoMSkubG9nLnppcC0tMTgtNDEtNDA=

Update: Meta space graph

Update: GC Logs: https://dl.dropboxusercontent.com/u/3642047/gc.log.zip

I ran the logs through gceasy to get some more detailed information as well, however it didn't make things much clearer for me. It seems that perhaps my young generation is under sized? http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDYvMjcvLS1nYyAoMSkubG9nLnppcC0tMTgtNDEtNDA= — Casey Jordan, Jun 27 '17 at 18:51
Which JVM are you using? I don't see any settings for meta space, so I fear that you aren't using JDK8. You should be. There are a lot of ways to tune the JVM. — duffymo, Jun 27 '17 at 18:55
I'd recommend that you read this. It helped me a great deal when I was tuning JDK 8: http://blog.sokolenko.me/2014/11/javavm-options-production.html — duffymo, Jun 27 '17 at 19:42
so you have logs, why not post them instead of graphs of reduced information content? — the8472, Jun 28 '17 at 09:37
Thanks @duffymo, though this seems to be specific to the CMS collector no? Were you tuning the G1 collector? — Casey Jordan, Jun 30 '17 at 17:42

score 3 · Accepted Answer · answered Jul 13 '17 at 00:11

I see at least two problems in your GC logs:

There were 4 full collections invoked explicitly by System.gc() or Runtime.getRuntme().gc(). Each was around 10 seconds long:
```
277042.600: [Full GC (System.gc())  12G->1537M(5126M), 11.4203806 secs]
```
You may want to add -XX:+ExplicitGCInvokesConcurrent JVM flag to prevent System.gc() from causing stop-the-world events.

It will be also useful to find who calls System.gc() and probably avoid this call altogether. For this purpose you may modify Runtime.gc method by adding code to print stack trace. Then recompile Runtime.java and prepend the modified class to bootstrap classpath with
-Xbootclasspath/p:/path/to/yourpatch.jar
Another problem is an extremely long safepoint synchronization pause not related to GC:
```
5512447.686: Total time for which application threads were stopped: 16.4426008 seconds, Stopping threads took: 16.4414390 seconds
```
This is typically caused by MappedByteBuffer I/O or because Java process began swapping to disk. See this and this answers regarding similar problems.

Awesome observations, the System.gc should be easy to fix if we can trackdown what libs are calling it, also we do use a library which does a lot with memory mapped files, sometimes up to 70G or more. Any suggestions on what to tune to allow the OS to better handle that besides forcing it all in RAM? Thanks! — Casey Jordan, Jul 13 '17 at 16:23
@CaseyJordan Unfortunately you can't do much about it but to ensure that file is in RAM or replace mapped I/O with regular FileChannel I/O or to disable Unsafe intrinsics in JVM (the latter may significantly decrease throughput). — apangin, Jul 15 '17 at 08:03

How to interpret Java G1 GC pause time cause

1 Answers1