Why does G1 cost so much time for object copy?

Question

The following is my gc log:

2016-08-16T01:45:35.968+0000: 62265.934: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 473956352 bytes, new threshold 15 (max 15)
- age   1:   12641224 bytes,   12641224 total
- age   2:    3092400 bytes,   15733624 total
- age   3:    1914704 bytes,   17648328 total
- age   4:     204696 bytes,   17853024 total
- age   5:     389896 bytes,   18242920 total
- age   6:     101528 bytes,   18344448 total
- age   7:    1106168 bytes,   19450616 total
- age   8:     344336 bytes,   19794952 total
- age   9:     301328 bytes,   20096280 total
- age  10:     309576 bytes,   20405856 total
- age  11:     305464 bytes,   20711320 total
- age  12:      32672 bytes,   20743992 total
- age  13:      41264 bytes,   20785256 total
- age  14:      50960 bytes,   20836216 total
- age  15:      56904 bytes,   20893120 total
 62265.934: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 7520, predicted base time: 185.93 ms, remaining time: 14.07 ms, target pause time: 200.00 ms]
 62265.934: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1793 regions, survivors: 9 regions, predicted young region time: 7.32 ms]
 62265.934: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1793 regions, survivors: 9 regions, old: 0 regions, predicted pause time: 193.25 ms, target pause time: 200.00 ms]
2016-08-16T01:45:36.626+0000: 62266.592: [SoftReference, 0 refs, 0.0208511 secs]2016-08-16T01:45:36.647+0000: 62266.613: [WeakReference, 21 refs, 0.0101522 secs]2016-08-16T01:45:36.657+0000: 62266.623: [FinalReference, 5106 refs, 0.0153084 secs]2016-08-16T01:45:36.673+0000: 62266.639: [PhantomReference, 24 refs, 0 refs, 0.0473559 secs]2016-08-16T01:45:36.720+0000: 62266.686: [JNI Weak Reference, 0.0000260 secs], 0.7593297 secs]
   [Parallel Time: 655.2 ms, GC Workers: 20]
      [GC Worker Start (ms): Min: 62265934.2, Avg: 62265936.5, Max: 62265953.3, Diff: 19.0]
      [Ext Root Scanning (ms): Min: 0.0, Avg: 0.9, Max: 8.7, Diff: 8.7, Sum: 17.9]
      [Update RS (ms): Min: 0.0, Avg: 0.6, Max: 1.4, Diff: 1.4, Sum: 12.9]
         [Processed Buffers: Min: 0, Avg: 6.2, Max: 24, Diff: 24, Sum: 125]
      [Scan RS (ms): Min: 0.1, Avg: 0.3, Max: 0.5, Diff: 0.3, Sum: 6.2]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5]
      [Object Copy (ms): Min: 0.0, Avg: 256.0, Max: 650.0, Diff: 650.0, Sum: 5119.2]
      [Termination (ms): Min: 0.0, Avg: 394.2, Max: 637.9, Diff: 637.9, Sum: 7883.4]
         [Termination Attempts: Min: 1, Avg: 2.5, Max: 9, Diff: 8, Sum: 51]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.5, Diff: 0.4, Sum: 3.8]
      [GC Worker Total (ms): Min: 635.7, Avg: 652.2, Max: 654.6, Diff: 18.9, Sum: 13043.7]
      [GC Worker End (ms): Min: 62266588.5, Avg: 62266588.7, Max: 62266588.9, Diff: 0.4]
   [Code Root Fixup: 0.2 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 1.4 ms]
   [Other: 102.5 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 96.4 ms]
      [Ref Enq: 1.6 ms]
      [Redirty Cards: 1.1 ms]
      [Humongous Register: 0.2 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 2.6 ms]
   [Eden: 7172.0M(7172.0M)->0.0B(928.0M) Survivors: 36.0M->60.0M Heap: 7522.8M(11.7G)->375.5M(11.7G)]
 [Times: user=8.15 sys=3.37, real=0.76 secs]

And my java options is:

-server -Xms8G -Xmx12G -XX:+UnlockDiagnosticVMOptions -XX:G1HeapRegionSize=4m -Xss512K -XX:+HeapDumpOnOutOfMemoryError -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5 -XX:InitiatingHeapOccupancyPercent=70 -XX:+ParallelRefProcEnabled -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -XX:+PrintTenuringDistribution -Xloggc:./gc.log

In addition, my cpu has 16 cores.So after I reduce the ParallelGCThreads to 15, it cut the young gc time by half. But still this problem happend.And the following is the new gc log :

2016-08-22T07:42:58.651+0000: 65778.814: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 322961408 bytes, new threshold 15 (max 15)
- age   1:    6522744 bytes,    6522744 total
- age   2:    2609200 bytes,    9131944 total
- age   3:    1348504 bytes,   10480448 total
- age   4:     557144 bytes,   11037592 total
- age   5:     251744 bytes,   11289336 total
- age   6:     972272 bytes,   12261608 total
- age   7:     500232 bytes,   12761840 total
- age   8:    2242544 bytes,   15004384 total
- age   9:     116296 bytes,   15120680 total
- age  10:     280728 bytes,   15401408 total
- age  11:      85728 bytes,   15487136 total
- age  12:     394272 bytes,   15881408 total
- age  13:     295568 bytes,   16176976 total
- age  14:     728600 bytes,   16905576 total
- age  15:    1356328 bytes,   18261904 total
 65778.814: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 8112, predicted base time: 62.52 ms, remaining time: 137.48 ms, target pause time: 200.00 ms]
 65778.814: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1223 regions, survivors: 5 regions, predicted young region time: 7.95 ms]
 65778.814: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1223 regions, survivors: 5 regions, old: 0 regions, predicted pause time: 70.46 ms, target pause time: 200.00 ms]
2016-08-22T07:43:02.747+0000: 65782.910: [SoftReference, 0 refs, 0.0021318 secs]2016-08-22T07:43:02.749+0000: 65782.912: [WeakReference, 4 refs, 0.0013452 secs]2016-08-22T07:43:02.750+0000: 65782.913: [FinalReference, 4029 refs, 0.0025002 secs]2016-08-22T07:43:02.753+0000: 65782.916: [PhantomReference, 11 refs, 0 refs, 0.0027905 secs]2016-08-22T07:43:02.756+0000: 65782.919: [JNI Weak Reference, 0.0000197 secs], 4.1088277 secs]
   [Parallel Time: 4094.4 ms, GC Workers: 15]
      [GC Worker Start (ms): Min: 65778814.7, Avg: 65778817.2, Max: 65778820.8, Diff: 6.1]
      [Ext Root Scanning (ms): Min: 0.0, Avg: 0.6, Max: 1.5, Diff: 1.5, Sum: 9.2]
      [Update RS (ms): Min: 0.0, Avg: 1.1, Max: 1.9, Diff: 1.9, Sum: 16.0]
         [Processed Buffers: Min: 0, Avg: 8.7, Max: 27, Diff: 27, Sum: 130]
      [Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum: 3.0]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5]
      [Object Copy (ms): Min: 4059.8, Avg: 4081.9, Max: 4090.2, Diff: 30.4, Sum: 61228.9]
      [Termination (ms): Min: 0.0, Avg: 7.4, Max: 28.2, Diff: 28.2, Sum: 111.1]
         [Termination Attempts: Min: 1, Avg: 1.3, Max: 3, Diff: 2, Sum: 19]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.2, Sum: 1.6]
      [GC Worker Total (ms): Min: 4087.8, Avg: 4091.4, Max: 4094.0, Diff: 6.2, Sum: 61370.5]
      [GC Worker End (ms): Min: 65782908.5, Avg: 65782908.6, Max: 65782908.7, Diff: 0.2]
   [Code Root Fixup: 0.2 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.8 ms]
   [Other: 13.5 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 9.5 ms]
      [Ref Enq: 0.8 ms]
      [Redirty Cards: 0.8 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 1.8 ms]
   [Eden: 4892.0M(4892.0M)->0.0B(388.0M) Survivors: 20.0M->20.0M Heap: 5284.2M(8192.0M)->392.7M(8192.0M)]
 [Times: user=0.00 sys=62.51, real=4.11 secs]

score 1 · Answer 1 · answered Aug 16 '16 at 07:30

1

While the G1 collector attempts to honour the Max GC time, it is not a hard limit and it can easily go over this time. One case can be having a large object to copy, or it appears at least one of the threads took a long time to perform it's copies.

If you have less than 20 CPUs, or another process running which means you have less than 20 free CPU, a GC thread can be schedule out for quite a while before it gets a chance to run again and complete.

If you look at

user=8.15 sys=3.37, real=0.76 secs

This suggest you used 11.52 seconds of CPU in 0.76 seconds, which means you only had about 15.1 CPUs running at once so about 5 of them couldn't run on average.

I would try reducing the number of CPU to 15 so you have no more threads than you have free CPUs.

answered Aug 16 '16 at 07:30

Peter Lawrey

525,659
79
751
1,130

Thanks for your kind reply! Actually my CPU is 16 cores, I had reduce the `ParallelGCThreads` to 15, it cut the young gc time by half, great! But this problem is still happen, may any other factor cause this ? – bychjzh Aug 16 '16 at 15:20
@bychjzh can you include a log for after you have tuned the cpus? – Peter Lawrey Aug 16 '16 at 21:31
@bychjzh You have "[Times: user=0.00 sys=62.51, real=4.11 secs]" which indicates all the time was spent in the kernel, most likely it was swapping heavily. I would check that you have plenty of free memory. – Peter Lawrey Aug 22 '16 at 08:20
At that time i still had 8+G cached memory, can it be used for that time swapping? – bychjzh Aug 22 '16 at 09:02
@bychjzh that sounds like enough. In any case, almost all the time is in the kernel. ARe you running on a bare machine or a virtual OS? – Peter Lawrey Aug 22 '16 at 09:06
I run this instance in the docker, and what is the difference between use bare machine and a virtual OS? – bychjzh Aug 22 '16 at 09:27
@bychjzh when run in a virtual machine you can appear to have more memory than you actual do. I suspect your bare machine has run out but your docker image doesn't know this. – Peter Lawrey Aug 22 '16 at 12:04
I tuned -Xmx8G and keep observing. Thanks again. – bychjzh Aug 22 '16 at 12:46
@bychjzh you really need to monitor the memory utilisation and swapping of the bare machine. – Peter Lawrey Aug 22 '16 at 12:50
Did you ever find out? Maybe NUMA? Sys time is so high, there is some problem in that machine – cocorossello Apr 25 '21 at 06:47

Why does G1 cost so much time for object copy?

1 Answers1