I am running jvm with -XX:+PrintSafepointStatistics, -XX:+PrintGCApplicationStoppedTime and -XX:+PrintGCApplicationConcurrentTime among others to investigate gc pauses. I noticed that
The vm pauses are much larger than gc pauses at 90% of total.
While most of the pauses are of low latency, I found a couple of vm pauses stopping my application for a very large duration.
On further investigation using PrintSafepointStatistics flag (as suggested in this blog: http://jpbempel.blogspot.in/2013/03/safety-first-safepoints.html), I found a particular vm-op "PrintThreads" causing pauses as long as 4.5 and 1.3 seconds.
I wonder what is triggering this vmop and why is it taking up so much time. If it helps, below is the list of all jvm flags I am using:
JAVA_OPTS="$JAVA_OPTS -XX:+UseParallelOldGC"
CATALINA_OPTS="-Xms2G -Xmx4G -XX:MaxPermSize=256m -Xloggc:/path/to/tomcat/apache-tomcat-8.0.18/logs/gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:MaxNewSize=1638m -XX:NewSize=1638m -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1"
Application stop time output in gc.log (Relevant portion):
1649.008: Application time: 0.0001730 seconds
1649.009: Total time for which application threads were stopped: 0.0003760 seconds
**1650.805**: Application time: 1.7963630 seconds
1655.315: Total time for which application threads were stopped: **4.5099710 seconds**
1655.315: Application time: 0.0000350 seconds
1655.316: Total time for which application threads were stopped: 0.0005540 seconds
1655.316: Application time: 0.0001600 seconds
Corresponding Safepoint statistics in catalina.out (Relevant portion):
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
1649.008: RevokeBias [ 165 0 0 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
**1650.805**: PrintThreads [ 168 0 0 ] [ 0 0 0 0 4509 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
1655.315: PrintJNI [ 168 0 0 ] [ 0 0 0 0 0 ] 0
Last timestamp before the application threads stopped for 4.5 seconds is 1650.805. This timestamp corresponds to "PrintThreads" vmop in catalina.out. I have verified this observation at multiple instances. Pauses greater than 1s are invariably being caused by "PrintThreads" vmop.
Thanks in advance.