1

I have strange G1 gc pauses - gc spends a lot of time if 'Ref Proc' phase. I'm running hbase region server with following gc options

Java HotSpot(TM) 64-Bit Server VM (25.192-b12) for linux-amd64 JRE (1.8.0_192-b12), built on Oct  6 2018 06:46:09 by "java_re" with gcc 7.3.0
Memory: 4k page, physical 65963112k(21241308k free), swap 0k(0k free)
CommandLine flags: -XX:G1HeapRegionSize=8388608 -XX:G1HeapWastePercent=15 -XX:G1MaxNewSizePercent=20 -XX:G1MixedGCCountTarget=16 -XX:G1OldCSetRegionThresholdPercent=2 -XX:G1RSetUpdatingPauseTimePercent=5 -XX:InitialHeapSize=6442450944 -XX:InitialTenuringThreshold=1 -XX:InitiatingHeapOccupancyPercent=75 -XX:+ManagementServer -XX:MaxDirectMemorySize=16106127360 -XX:MaxGCPauseMillis=25 -XX:MaxHeapSize=6442450944 -XX:MaxTenuringThreshold=1 -XX:-OmitStackTraceInFastThrow -XX:OnOutOfMemoryError=kill -9 %p -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -XX:+ParallelRefProcEnabled -XX:+PerfDisableSharedMem -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintReferenceGC -XX:+PrintTenuringDistribution -XX:-ResizePLAB -XX:SurvivorRatio=6 -XX:+UnlockExperimentalVMOptions -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC 

Most of pauses take approximatelly 30ms, 99.9% - 70ms But from time to time (a couple of times a day) I have pauses longer 500ms, vast majority of time is taken by 'Ref Prof' Example of such pause:

2018-11-05T20:50:31.718+0000: 48282.452: [GC pause (G1 Evacuation Pause) (mixed)
Desired survivor size 29360128 bytes, new threshold 1 (max 1)
- age   1:    6821648 bytes,    6821648 total
 48282.452: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 3250, predicted base time: 21.77 ms, remaining time: 3.23 ms, target pause time: 25.00 ms]
 48282.452: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 37 regions, survivors: 1 regions, predicted young region time: 4.87 ms]
 48282.452: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: old CSet region num reached max, old: 16 regions, max: 16 regions]
 48282.452: [G1Ergonomics (CSet Construction) added expensive regions to CSet, reason: old CSet region num not reached min, old: 16 regions, expensive: 16 regions, min: 37 regions, remaining time: 0.00 ms]
 48282.452: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 37 regions, survivors: 1 regions, old: 16 regions, predicted pause time: 37.72 ms, target pause time: 25.00 ms]
2018-11-05T20:50:31.735+0000: 48282.469: [SoftReference, 0 refs, 0.0030858 secs]2018-11-05T20:50:32.284+0000: 48283.018: [WeakReference, 2208 refs, 0.0086701 secs]2018-11-05T20:50:32.293+0000: 48283.027: [FinalReference, 58 refs, 0.0025096 secs]2018-11-05T20:50:32.295+0000: 48283.029: [PhantomReference, 0 refs, 10 refs, 0.0048406 secs]2018-11-05T20:50:32.300+0000: 48283.034: [JNI Weak Reference, 0.0000199 secs] 48283.040: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 498 regions, reclaimable: 2389907808 bytes (37.10 %), threshold: 15.00 %]
, 0.5876303 secs]
   [Parallel Time: 16.4 ms, GC Workers: 18]
      [GC Worker Start (ms): Min: 48282452.3, Avg: 48282452.6, Max: 48282453.0, Diff: 0.6]
      [Ext Root Scanning (ms): Min: 0.4, Avg: 0.8, Max: 1.3, Diff: 0.8, Sum: 13.8]
      [Update RS (ms): Min: 0.2, Avg: 1.4, Max: 2.0, Diff: 1.8, Sum: 24.5]
         [Processed Buffers: Min: 1, Avg: 3.2, Max: 8, Diff: 7, Sum: 57]
      [Scan RS (ms): Min: 5.9, Avg: 6.4, Max: 6.8, Diff: 1.0, Sum: 114.6]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 7.1, Avg: 7.2, Max: 8.1, Diff: 1.0, Sum: 129.9]
      [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.1]
         [Termination Attempts: Min: 1, Avg: 1.4, Max: 3, Diff: 2, Sum: 25]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.3]
      [GC Worker Total (ms): Min: 15.4, Avg: 15.8, Max: 16.1, Diff: 0.6, Sum: 284.2]
      [GC Worker End (ms): Min: 48282468.4, Avg: 48282468.4, Max: 48282468.4, Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 2.0 ms]
   [Other: 569.2 ms]
      [Choose CSet: 0.2 ms]
      [Ref Proc: 565.6 ms]
      [Ref Enq: 1.2 ms]
      [Redirty Cards: 1.5 ms]
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.4 ms]
   [Eden: 296.0M(296.0M)->0.0B(280.0M) Survivors: 8192.0K->24.0M Heap: 4615.3M(6144.0M)->4238.7M(6144.0M)]
 [Times: user=0.43 sys=0.01, real=0.59 secs] 

A lot of pauses spend around 20-30ms in 'Ref Proc', and what is common for them - they spend time collecting SoftReference, for example [SoftReference, 0 refs, 0.0166254 secs] How can it take so long to collect zero references?

How can I reduce these pauses?

  • Zero collected references does not imply that there weren’t references. It just implies that none of them was collected, e.g. because their referent is also strongly reachable or in case of soft references, because the current policy’s conditions were not fulfilled. Still, traversing the encountered references, to check whether they are eligible to clearing and enqueuing, may take some time, regardless of how many are eligible. – Holger Nov 07 '18 at 13:54
  • I see, good point. But according to log in never collects any soft references, it is always zero. and according to heap dump there are very few of them exist - a couple of hundreds. – user1987133 Nov 08 '18 at 10:42

0 Answers0