1

JDK version:

java version "1.7.0_91"
OpenJDK Runtime Environment (IcedTea 2.6.3) (7u91-2.6.3-0ubuntu0.14.04.1)
OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode)

GC/Mem parameters:

-Xms24g -Xmx24g -Xss256k -Djava.awt.headless=true -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:MaxGCPauseMillis=10000 -XX:InitiatingHeapOccupancyPercent=30 -XX:+ParallelRefProcEnabled -XX:G1HeapWastePercent=5 -XX:G1OldCSetRegionThresholdPercent=20 -XX:G1HeapRegionSize=16m -XX:G1MixedGCLiveThresholdPercent=80 -XX:+PrintFlagsFinal -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy

Occasionally, I see long young gc pauses longer than 1 minute. I checked the GC log and found they are all appear with a concurrent marking cycle initialization for a mixed gc event. Since the estimation pause of the young gc is very quick, my guess is that the concurrent marking cycle is impacting the normal young gc performance. How can I reduce the impact? I check through the gc log. Most of the concurrent marking cycle has no impact on the young gc, only very few of them has big impact. The gc log is like this:

   [Eden: 10.8G(10.8G)->0.0B(10.3G) Survivors: 160.0M->432.0M Heap: 21.3G(24.0G)->10.9G(24.0G)]
 [Times: user=3.61 sys=0.76, real=0.35 secs] 
 505008.424: [G1Ergonomics (Concurrent Cycles) initiate concurrent cycle, reason: concurrent cycle initiation requested]
505008.424: [GC pause (young) (initial-mark) 505008.425: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 151866, predicted base time: 65.35 ms, remaining time: 9934.65 ms, target pause time: 10000.00 ms]
 505008.425: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 659 regions, survivors: 27 regions, predicted young region time: 206.46 ms]
 505008.425: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 659 regions, survivors: 27 regions, old: 0 regions, predicted pause time: 271.81 ms, target pause time: 10000.00 ms]
, 90.8122050 secs]
   [Parallel Time: 90688.1 ms, GC Workers: 23]
      [GC Worker Start (ms): Min: 505008426.7, Avg: 505008427.4, Max: 505008427.9, Diff: 1.2]
      [Ext Root Scanning (ms): Min: 2.7, Avg: 7.8, Max: 18.4, Diff: 15.7, Sum: 179.8]
      [Code Root Marking (ms): Min: 0.0, Avg: 0.3, Max: 4.2, Diff: 4.2, Sum: 7.2]
      [Update RS (ms): Min: 33.8, Avg: 47.4, Max: 61.1, Diff: 27.3, Sum: 1090.1]
         [Processed Buffers: Min: 22, Avg: 40.2, Max: 107, Diff: 85, Sum: 924]
      [Scan RS (ms): Min: 0.0, Avg: 0.2, Max: 0.6, Diff: 0.5, Sum: 5.1]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [Object Copy (ms): Min: 85537.6, Avg: 87188.8, Max: 90638.2, Diff: 5100.6, Sum: 2005342.7]
      [Termination (ms): Min: 0.0, Avg: 3441.6, Max: 5092.4, Diff: 5092.4, Sum: 79155.8]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.4, Max: 0.8, Diff: 0.8, Sum: 10.1]
      [GC Worker Total (ms): Min: 90685.8, Avg: 90686.6, Max: 90687.3, Diff: 1.5, Sum: 2085790.9]
      [GC Worker End (ms): Min: 505099113.5, Avg: 505099113.9, Max: 505099114.3, Diff: 0.8]
   [Code Root Fixup: 0.2 ms]
   [Code Root Migration: 0.3 ms]
   [Clear CT: 1.6 ms]
   [Other: 122.0 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 105.3 ms]
      [Ref Enq: 4.8 ms]
      [Free CSet: 3.0 ms]
   [Eden: 10.3G(10.3G)->0.0B(128.0M) Survivors: 432.0M->1376.0M Heap: 21.2G(24.0G)->12.7G(24.0G)]
 [Times: user=82.22 sys=7.54, real=90.81 secs] 
505099.236: [GC concurrent-root-region-scan-start]
505099.311: [GC pause (young)505099.331: [GC concurrent-root-region-scan-end, 0.0943260 secs]
505099.331: [GC concurrent-mark-start]
 505099.331: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 322026, predicted base time: 159.64 ms, remaining time: 9840.36 ms, target pause time: 10000.00 ms]
 505099.331: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 8 regions, survivors: 86 regions, predicted young region time: 6030.20 ms]
 505099.331: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 8 regions, survivors: 86 regions, old: 0 regions, predicted pause time: 6189.84 ms, target pause time: 10000.00 ms]
 505101.745: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: recent GC overhead higher than threshold after GC, recent GC overhead: 28.00 %, threshold: 10.00 %, uncommitted: 0 bytes, calculated expansion amount: 0 bytes (20.00 %)]
, 2.4339110 secs]
   [Root Region Scan Waiting: 19.4 ms]
   [Parallel Time: 2385.4 ms, GC Workers: 23]
      [GC Worker Start (ms): Min: 505099330.9, Avg: 505099331.2, Max: 505099331.4, Diff: 0.4]
      [Ext Root Scanning (ms): Min: 3.5, Avg: 12.8, Max: 17.0, Diff: 13.5, Sum: 295.5]
      [SATB Filtering (ms): Min: 0.0, Avg: 0.1, Max: 1.7, Diff: 1.7, Sum: 1.7]
      [Update RS (ms): Min: 2222.6, Avg: 2229.1, Max: 2266.2, Diff: 43.5, Sum: 51268.7]
         [Processed Buffers: Min: 41, Avg: 65.8, Max: 120, Diff: 79, Sum: 1514]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.1, Sum: 0.9]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 104.2, Avg: 141.5, Max: 147.9, Diff: 43.7, Sum: 3254.6]
      [Termination (ms): Min: 0.0, Avg: 0.2, Max: 0.2, Diff: 0.2, Sum: 3.6]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.5, Max: 9.7, Diff: 9.7, Sum: 12.6]
      [GC Worker Total (ms): Min: 2383.9, Avg: 2384.3, Max: 2384.8, Diff: 0.8, Sum: 54837.8]
      [GC Worker End (ms): Min: 505101715.3, Avg: 505101715.4, Max: 505101715.7, Diff: 0.4]
   [Code Root Fixup: 0.3 ms]
   [Code Root Migration: 0.0 ms]
   [Clear CT: 0.4 ms]
   [Other: 28.4 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 19.7 ms]
      [Ref Enq: 0.3 ms]
      [Free CSet: 4.6 ms]
   [Eden: 128.0M(128.0M)->0.0B(1024.0M) Survivors: 1376.0M->192.0M Heap: 12.8G(24.0G)->12.9G(24.0G)]
 [Times: user=3.79 sys=1.10, real=2.43 secs] 
505103.032: [GC concurrent-mark-end, 3.7007580 secs]
505103.176: [GC remark 505103.181: [GC ref-proc, 0.0819490 secs], 0.0989750 secs]
 [Times: user=1.10 sys=0.15, real=0.10 secs] 
505103.278: [GC cleanup 13G->13G(24G), 0.0293890 secs]
 [Times: user=0.21 sys=0.14, real=0.03 secs] 
505103.308: [GC concurrent-cleanup-start]
505103.308: [GC concurrent-cleanup-end, 0.0001760 secs]
505104.534: [GC pause (young) 505104.538: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 623522, predicted base time: 2265.93 ms, remaining time: 7734.07 ms, target pause time: 10000.00 ms]
 505104.538: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 64 regions, survivors: 12 regions, predicted young region time: 5414.13 ms]
 505104.538: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 64 regions, survivors: 12 regions, old: 0 regions, predicted pause time: 7680.06 ms, target pause time: 10000.00 ms]
 505104.660: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: recent GC overhead higher than threshold after GC, recent GC overhead: 31.31 %, threshold: 10.00 %, uncommitted: 0 bytes, calculated expansion amount: 0 bytes (20.00 %)]
 505104.660: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 13706985472 bytes, allocation request: 0 bytes, threshold: 7730941110 bytes (30.00 %), source: end of GC]
 505104.660: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: candidate old regions available, candidate old regions: 290 regions, reclaimable: 1891629432 bytes (7.34 %), threshold: 5.00 %]
, 0.1258480 secs]
   [Parallel Time: 86.7 ms, GC Workers: 23]
      [GC Worker Start (ms): Min: 505104538.0, Avg: 505104538.4, Max: 505104538.5, Diff: 0.5]
      [Ext Root Scanning (ms): Min: 3.3, Avg: 7.6, Max: 19.2, Diff: 15.9, Sum: 175.1]
      [Update RS (ms): Min: 53.4, Avg: 64.9, Max: 69.0, Diff: 15.6, Sum: 1491.9]
         [Processed Buffers: Min: 76, Avg: 120.3, Max: 149, Diff: 73, Sum: 2766]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.5]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 9.8, Avg: 13.5, Max: 14.1, Diff: 4.3, Sum: 310.1]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.8]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 1.4]
      [GC Worker Total (ms): Min: 85.9, Avg: 86.1, Max: 86.4, Diff: 0.5, Sum: 1979.9]
      [GC Worker End (ms): Min: 505104624.4, Avg: 505104624.4, Max: 505104624.5, Diff: 0.1]
   [Code Root Fixup: 0.1 ms]
   [Code Root Migration: 0.0 ms]
   [Clear CT: 0.7 ms]
   [Other: 38.4 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 32.0 ms]
      [Ref Enq: 0.6 ms]
      [Free CSet: 0.6 ms]
   [Eden: 1024.0M(1024.0M)->0.0B(1056.0M) Survivors: 192.0M->160.0M Heap: 13.7G(24.0G)->12.9G(24.0G)]
 [Times: user=1.56 sys=0.23, real=0.12 secs] 
  • Have you tried to reduce the heap size? It seems that you have around 13GB of permanent (or long-living) data, and long pauses only occur when GC is trying to clean a great deal of regions and move living objects around (Object Copy takes nearly all the time). Conc. marking cycle is not a stop-the-world, it only signals that GC is going to switch to the mixed mode. If you reduce the heap size, the collections will be more frequent but they will also be shorter in time. I would try 16Gb and default Java options. – Andrew Lygin Aug 10 '16 at 06:11
  • According to the Java options you provided, you have already played around with various settings. What did you try so far? How did you come to those values? – Andrew Lygin Aug 10 '16 at 06:11
  • @AndrewLygin the object copy seems entirely single-threaded and the 10GB eden manages to get sequeezed into 1.3GB of survivors. I.e. only about 1.3GB were live in the young gen (+ whatever got promoted into old gen). Which is also what happened in the previous young collection. so something is off here – the8472 Aug 10 '16 at 11:02
  • Have you tried updating to 1.8? Lots of work went into G1 between the releases. – the8472 Aug 10 '16 at 11:03
  • @Andrew, Actually we have increased the heap size from 8G to the current 24G to avoid full GC. The log I showed above is at one time point. There are other time points when much more requests come and the heap usage increases rapidly. So it's not safe to reduce the heap size. – Rural Hunter Oct 27 '16 at 06:53
  • @the8472, Not yet. It would need some additional work for us to upgrade to jdk 1.8. We may try it later. – Rural Hunter Oct 27 '16 at 06:54
  • I noticed the total Eden size is changed dynamically frequently. The log above shows it changes from 10.3G->128M->1024M. Based on what the G1 GC changes the value? I searched the doc but it seems not described clearly. From my observation, it would be better if it is fixed around at 4G in our case. But changing the size is not recommended in G1. – Rural Hunter Oct 27 '16 at 07:00

1 Answers1

0

Update to Java 8 ASAP - G1 has had a TON of fixes and improvements since Java 7. Likely including this issue -- the predictions I've seen during CSet construction have been generally accurate. If anything they assume GC will take longer than it does. Plus the performance difference is night and day, especially for G1.

You have most of the settings I would recommend at this point, there's one to look at though: consider increasing the eden size with -XX:G1NewSizePercent=10 -- theoretically this increases the time for the young GC cycles, but in practice it may do the opposite (by decreasing the old set size and making objects easier to collect).

This also would worry me:

[Times: user=82.22 sys=7.54, real=90.81 secs] [Times: user=1.56 sys=0.23, real=0.12 secs]

That's an awful lot of system time -- perhaps paging is happening or similar?

BobMcGee
  • 19,824
  • 10
  • 45
  • 57
  • I tried -XX:G1NewSizePercent=30 before but it didn't make much difference. I will try to upgrade to jdk 8 and also check the paging issue. Thanks! – Rural Hunter Nov 22 '16 at 05:31