2

JVM hotspot logs processed and converted to CSV to plot graph which will show the time taken by different VM operations

We are facing a peculiar problem in our clustered application. After running the system for sometime, suddenly the application is freezing and couldn't find any clue what is causing this.

After enabling JVM hotspot logs we see that "ParallelGCFailedAllocation", "Revoke Bias" is taking more time.

Refer to attached graph which is plotted by parsing the hotspot logs and converted to csv. The graph shows at certain time the "ParallelGCFailedAllocation", "Revoke Bias" is spiking and take around 13 secs which is not normal. We are trying to find what is causing it to take so much time.

Anybody having clue on how to debug such issue?

Enviroment details:

32 core machine running in VMWare hypervisor.
Heap Size: 12GB
RHEL 7 with Open JDK 8
  • Possible duplicate http://stackoverflow.com/questions/25039960/large-sync-value-for-parallelgcfailedallocation – Peter Lawrey Sep 14 '16 at 12:04
  • @PeterLawrey has a good suggestion there. It would be useful to know a little bit more about your application's nature. What is it mainly doing? Any huge string processing (xml payloads etc.) involved? – bashnesnos Sep 14 '16 at 12:58
  • @bashnesnos It is an enterprise application deployed in JBoss container with clustered environment. The application has got many components which involves conversion of input xml into a different format. Also the nodes in the cluster are communicated using JBoss Remoting. Also JMS is in place for async communication between the Nodes. – Purushothama Yanamala Sep 15 '16 at 06:49
  • To add one more note, my application is time sensitive and if the application is not responding for more than 10 secs then the cluster assumes that the particular node is dead and it will remove it from the cluster view. – Purushothama Yanamala Sep 15 '16 at 06:52
  • Could you please specify how much time is spent in "Stopping threads took: xxx seconds"? It will be very helpful – qwwdfsad Sep 15 '16 at 14:34
  • Given sample lines from hotspot logs which are blocking for more than 12secs. – Purushothama Yanamala Sep 16 '16 at 06:28
  • vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count 281149.062: ParallelGCFailedAllocation [ 2799 0 2 ] [ 0 0 1 31 13001 ] 0 vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count 281204.312: RevokeBias [ 2835 1 6 ] [ 0 0 108 34 12236 ] 0 – Purushothama Yanamala Sep 16 '16 at 06:36
  • Could you please add GC log output for a spike period with flags -XX:+PrintGCDetails -XX:+PrintGCTimeStamps? It seems that your spikes start with ParallelGCAllocationFailed, and it might be caused by huge payloads. – bashnesnos Sep 19 '16 at 08:30
  • Here is another instance where the GC pause is around 8.8secs. 2016-09-21T07:01:42.975+0530: 49525.971: [Full GC (Ergonomics) [PSYoungGen: 46075K->0K(3954688K)] [ParOldGen: 8378813K->4538477K(8388608K)] 8424888K->4538477K(12343296K), [Metaspace: 359180K->359076K(1394688K)], 8.8093958 secs] [Times: user=97.54 sys=0.00, real=8.81 secs] – Purushothama Yanamala Sep 21 '16 at 16:35

1 Answers1

0

wow, you have about 2800 threads in your application, its too much! Also your heap is too huge, 4GB in young gen and 8 GB in old gen. What are you expecting in this case ?

From PrintSafepointStatistics output, you have no problems with safepoint sync, actually vm operation takes all time.

You can disable biased locking -XX:-UseBiasedLocking and use concurrent gc's (CMS\G1) instead of parallel old gc, maybe this will help you and reduce pauses a little bit, but the main problem is bad configuration and maybe code/design.

  • Use size-limited thread pools, ~2800 threads is too much
  • 12 GB is huge heap, also young gen should be not so big.
  • profile your apllication (JFR, yourkit, jprofiler, visualvm) can help you to find allocation hotspots.
  • also eclipse MAT can help you to analyze heap
  • if you want to trace revokeBias, add -XX:+TraceBiasedLocking
commit-man
  • 376
  • 1
  • 2
  • 5