0

I am running jvm with -XX:+PrintSafepointStatistics, -XX:+PrintGCApplicationStoppedTime and -XX:+PrintGCApplicationConcurrentTime among others to investigate gc pauses. I noticed that

  1. The vm pauses are much larger than gc pauses at 90% of total.

  2. While most of the pauses are of low latency, I found a couple of vm pauses stopping my application for a very large duration.

On further investigation using PrintSafepointStatistics flag (as suggested in this blog: http://jpbempel.blogspot.in/2013/03/safety-first-safepoints.html), I found a particular vm-op "PrintThreads" causing pauses as long as 4.5 and 1.3 seconds.

I wonder what is triggering this vmop and why is it taking up so much time. If it helps, below is the list of all jvm flags I am using:

JAVA_OPTS="$JAVA_OPTS -XX:+UseParallelOldGC"

CATALINA_OPTS="-Xms2G -Xmx4G -XX:MaxPermSize=256m -Xloggc:/path/to/tomcat/apache-tomcat-8.0.18/logs/gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:MaxNewSize=1638m -XX:NewSize=1638m -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1"

Application stop time output in gc.log (Relevant portion):

1649.008: Application time: 0.0001730 seconds  
1649.009: Total time for which application threads were stopped: 0.0003760 seconds  
**1650.805**: Application time: 1.7963630 seconds  
1655.315: Total time for which application threads were stopped: **4.5099710 seconds**  
1655.315: Application time: 0.0000350 seconds  
1655.316: Total time for which application threads were stopped: 0.0005540 seconds  
1655.316: Application time: 0.0001600 seconds  

Corresponding Safepoint statistics in catalina.out (Relevant portion):

         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
1649.008: RevokeBias                       [     165          0              0    ]      [     0     0     0     0     0    ]  0   

         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
**1650.805**: PrintThreads                     [     168          0              0    ]      [     0     0     0     0  4509    ]  0   

         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
1655.315: PrintJNI                         [     168          0              0    ]      [     0     0     0     0     0    ]  0   

Last timestamp before the application threads stopped for 4.5 seconds is 1650.805. This timestamp corresponds to "PrintThreads" vmop in catalina.out. I have verified this observation at multiple instances. Pauses greater than 1s are invariably being caused by "PrintThreads" vmop.

Thanks in advance.

the8472
  • 40,999
  • 5
  • 70
  • 122
  • can you add the safepoint statistics output? – the8472 Apr 09 '15 at 13:16
  • PrintThreads should just be printing a stack trace. It should be pretty fast, 4.5 seconds makes no sense. How many threads do you have? Where is it outputting to? If it is outputting to a slow location perhaps it is taking a while to flush the output? – Pace Apr 09 '15 at 15:25
  • @the8472 I have added the safepoint statistics. – tanujrastogi Apr 10 '15 at 08:15
  • @Pace The number of threads in this application is around ~90. When I am conducting the load test, the number of threads shoots up to around 190-200. I am not sure about where the stack trace, if any, is directed to. Could you tell me anything more about the PrintThreads vmop like when or why is it triggered? The application does maintain a log file where all user generated logs are maintained. During a heavy load test of about 40 minutes, it can get quite large (~650 MB). Is this relevant to "PrintThreads" vmop? – tanujrastogi Apr 10 '15 at 08:25

1 Answers1

1

Could you tell me anything more about the PrintThreads vmop like when or why is it triggered?

There are several ways to trigger it

  • sending SIGBREAK to the process / hitting Ctrl+Break in the console.
  • calling JVM_DumpAllStacks from native code
  • attaching a java agent (profilers/monitoring tools) that performs thread dumps
  • invocation via DiagnosticCommandMBean

Just speculating, but if whatever triggers the vmop writes it through a network connection and whatever is on the other end is slow then that might stall the entire VM.

the8472
  • 40,999
  • 5
  • 70
  • 122