Why does Full GC happen before the memory is exhausted?

Question

When using the G1GC in JVM 8, Full GC (Allocation Failure) happens before memory is Exhausted. And it appear every 30 minutes, every time it stop the application 6-10 second, about 300-500 requests were timeout failure, that's badly for me.

I don't understand why [Full GC (Allocation Failure) 16G->13G(20G), 8.9794231 secs] could happen. allocation request: 304 bytes，but the heap actually still have 4G space, why can’t it be allocated?

I suspected it was a fragmentation problem, but I don't konw how to verify. Any verification method is welcome.

The last Full GC detail is:

2019-12-11T18:42:12.687+0800: 1312029.392: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
Desired survivor size 67108864 bytes, new threshold 3 (max 3)
 1312029.393: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 37379, predicted base time: 20.39 ms, remaining time: 69.61 ms, target pause time: 90.00 ms]
 1312029.393: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 0 regions, survivors: 0 regions, predicted young region time: 0.00 ms]
 1312029.393: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 0 regions, survivors: 0 regions, old: 0 regions, predicted pause time: 20.39 ms, target pause time: 90.00 ms]
, 0.0180293 secs]
   [Parallel Time: 12.8 ms, GC Workers: 16]
      [GC Worker Start (ms): Min: 1312029393.4, Avg: 1312029393.5, Max: 1312029393.6, Diff: 0.2]
      [Ext Root Scanning (ms): Min: 3.1, Avg: 3.9, Max: 11.9, Diff: 8.8, Sum: 62.0]
      [Update RS (ms): Min: 0.0, Avg: 5.0, Max: 6.1, Diff: 6.1, Sum: 80.2]
         [Processed Buffers: Min: 0, Avg: 9.2, Max: 14, Diff: 14, Sum: 148]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 0.2, Avg: 0.3, Max: 0.5, Diff: 0.3, Sum: 5.0]
      [Termination (ms): Min: 0.0, Avg: 3.3, Max: 3.8, Diff: 3.8, Sum: 52.4]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 16]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
      [GC Worker Total (ms): Min: 12.4, Avg: 12.5, Max: 12.6, Diff: 0.2, Sum: 199.9]
      [GC Worker End (ms): Min: 1312029406.0, Avg: 1312029406.0, Max: 1312029406.0, Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.2 ms]
   [Other: 5.0 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 2.4 ms]
      [Ref Enq: 0.2 ms]
      [Redirty Cards: 0.2 ms]
      [Humongous Register: 0.4 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.0 ms]
   [Eden: 0.0B(1024.0M)->0.0B(1024.0M) Survivors: 0.0B->0.0B Heap: 16.4G(20.0G)->16.4G(20.0G)]
Heap after GC invocations=452782 (full 1):
 garbage-first heap   total 20971520K, used 17217092K [0x00000002c0000000, 0x00000002c2001400, 0x00000007c0000000)
  region size 32768K, 0 young (0K), 0 survivors (0K)
 Metaspace       used 60984K, capacity 61408K, committed 62156K, reserved 1105920K
  class space    used 6123K, capacity 6258K, committed 6400K, reserved 1048576K
}
 [Times: user=0.00 sys=0.00, real=0.01 secs]
2019-12-11T18:42:12.705+0800: 1312029.411: [GC concurrent-root-region-scan-start]
2019-12-11T18:42:12.705+0800: 1312029.411: [GC concurrent-root-region-scan-end, 0.0000510 secs]
2019-12-11T18:42:12.705+0800: 1312029.411: [GC concurrent-mark-start]
2019-12-11T18:42:12.705+0800: 1312029.411: Total time for which application threads were stopped: 0.0282629 seconds, Stopping threads took: 0.0003029 seconds
2019-12-11T18:42:12.706+0800: 1312029.412: Application time: 0.0009042 seconds
 1312029.419: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 304 bytes]
 1312029.419: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 33554432 bytes, attempted expansion amount: 33554432 bytes]
 1312029.419: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap already fully expanded]
{Heap before GC invocations=452782 (full 1):
 garbage-first heap   total 20971520K, used 17217092K [0x00000002c0000000, 0x00000002c2001400, 0x00000007c0000000)
  region size 32768K, 0 young (0K), 0 survivors (0K)
 Metaspace       used 60984K, capacity 61408K, committed 62156K, reserved 1105920K
  class space    used 6123K, capacity 6258K, committed 6400K, reserved 1048576K
2019-12-11T18:42:12.714+0800: 1312029.420: [Full GC (Allocation Failure)  16G->13G(20G), 8.9794231 secs]
   [Eden: 0.0B(1024.0M)->0.0B(1248.0M) Survivors: 0.0B->0.0B Heap: 16.4G(20.0G)->13.1G(20.0G)], [Metaspace: 60984K->60984K(1105920K)]
Heap after GC invocations=452783 (full 2):
 garbage-first heap   total 20971520K, used 13750684K [0x00000002c0000000, 0x00000002c2001400, 0x00000007c0000000)
  region size 32768K, 0 young (0K), 0 survivors (0K)
 Metaspace       used 60984K, capacity 61408K, committed 62156K, reserved 1105920K
  class space    used 6123K, capacity 6258K, committed 6400K, reserved 1048576K
}
 [Times: user=6.25 sys=2.34, real=8.98 secs]
2019-12-11T18:42:21.694+0800: 1312038.400: Total time for which application threads were stopped: 8.9877650 seconds, Stopping threads took: 0.0001943 seconds
2019-12-11T18:42:21.695+0800: 1312038.400: [GC concurrent-mark-abort]
2019-12-11T18:42:21.931+0800: 1312038.637: Application time: 0.2366224 seconds
{Heap before GC invocations=452783 (full 2):
 garbage-first heap   total 20971520K, used 15028636K [0x00000002c0000000, 0x00000002c2001400, 0x00000007c0000000)
  region size 32768K, 39 young (1277952K), 0 survivors (0K)
 Metaspace       used 60984K, capacity 61408K, committed 62156K, reserved 1105920K
  class space    used 6123K, capacity 6258K, committed 6400K, reserved 1048576K

The JVM GC arguments:

     -XX:+UseG1GC
     -Xmx20480m
     -XX:+UnlockExperimentalVMOptions

     -XX:MaxGCPauseMillis=90
     -XX:ConcGCThreads=4
     -XX:ParallelGCThreads=16
     -XX:MaxTenuringThreshold=3
     -XX:G1HeapRegionSize=32m
     -XX:InitiatingHeapOccupancyPercent=70
     -XX:G1NewSizePercent=5
     -XX:G1MaxNewSizePercent=30
     -XX:G1MixedGCCountTarget=16
     -XX:G1OldCSetRegionThresholdPercent=3
     -XX:G1HeapWastePercent=15

why this arguments is set up:
1. IHOP is 70%, because there are so many caches and requests in my program (about 65%);
2. MaxNewSizePercent can be smaller, but it should not causes full gc;
3. -XX:G1MixedGCCountTarget and -XX:G1OldCSetRegionThresholdPercent combined effect is that 16 collections occur in a mixed GC, each time 600M, Considering that there is 13G of resident memory, 9.6G each time is an acceptable number;
4. And -XX:G1HeapWastePercent is 15%, actually it bothers me too, there are many large objects (more than 16M) in my process,so the only reason for me to set it up is to avoid mixed gc too frequently.

But i can't find the reason why this full gc happen from argument above. Any advice or suggestion will be very appreciated!

Would you rather it just threw an `OutOfMemoryError` instead? — Kayaman, Dec 11 '19 at 11:44
The garbage collector collects garbage, that's what it does. I fail to see why or how this is an issue in this case. Do you think it should behave differently? If so: how do you *expect* it to behave? — Joachim Sauer, Dec 11 '19 at 11:46
Since you said “any advice or suggestion will be very appreciated”, here’s my advice: remove every JVM option you don’t understand. — Holger, Dec 11 '19 at 11:52
@JoachimSauer I mean the heap actually still have 4G space, why can’t it be allocated? In my understanding, there is no Full GC due to memory fragmentation in G1, so I don't know why. In fact, I have 5 machines running the same process, and they have about the same write traffic (50MB / s), but only one of them has this problem, maybe caused by QPS? — a.fan, Dec 11 '19 at 12:42
Could you clarify why this is a problem? Or are you just interested in the details of how the garbage collector works? — Thomas Padron-McCarthy, Dec 11 '19 at 12:46
@Holger I can roughly describe why this is set up. IHOP is 70%, because there are so many caches and requests in my program (about 65%); MaxNewSizePercent can be smaller, but it should not causes full gc; -XX:G1MixedGCCountTarget and -XX:G1OldCSetRegionThresholdPercent combined effect is that 16 collections occur in a mixed GC, each time 600M, Considering that there is 13G of resident memory, 9.6G each time is an acceptable number; — a.fan, Dec 11 '19 at 12:55
@Holger And -XX:G1HeapWastePercent is 15%, actually it bothers me too, there are many large objects (more than 16M) in my process,so the only reason for me to set it up is to avoid mixed gc too frequently. But i can't find the reason why this full gc happen from argument above. — a.fan, Dec 11 '19 at 12:58
@ThomasPadron-McCarthy This full gc happend every 30 minutes, it will stop the application 6-10 second, and during this time, about 300-500 requests were timeout. — a.fan, Dec 11 '19 at 13:04
@Mr王: all of these are good details that you should probably put into the question so that people understand the context. As it stands right now it looked like "it's doing a garbage collection, how do I stop it" (which is a silly question without the required context). — Joachim Sauer, Dec 11 '19 at 14:14
When a concurrent collection is initiated at 70% heap occupancy, it’s possible that the collection is not fast enough to catch up with the application threads still doing allocations. The heap doesn’t have to be at 100% to trigger a full collections, it’s already enough if the fragmentation prevents the concurrent collection to find an entirely empty target region. — Holger, Dec 11 '19 at 16:12
@Holger this would be a very good case to try `Shenandoah` with it's `-XX:ShenandoahPacingMaxDelay` and/or `-XX:ShenandoahGuaranteedGCInterval`... but suggesting to change the `GC` is a huge thing for almost everyone, I guess. — Eugene, Dec 11 '19 at 16:41
It’s also worth checking whether [this bug](https://bugs.openjdk.java.net/browse/JDK-8140597) applies. — Holger, Dec 11 '19 at 16:52
@Eugene As far as I know Shenandoah is more suitable for larger heaps, which seems it's a promising solution, but i'm not sure it can work well in 20G heap environment. Maybe i should do some research. — a.fan, Dec 12 '19 at 02:30
@Holger I also suspected the problem caused by fragmentation, but this is contrary to the G1 principle I learned. And notice this log `allocation request: 304 bytes`, it only need 304b, and there still have 4GB, i doubt that the fragmentation could be so servre. Do you know some method that can do some verification? — a.fan, Dec 12 '19 at 02:41
@JoachimSauer Sorry, this is my first question,not so familiar. And i assume that everyone will read the logs and understand the context. I will add that to the question. — a.fan, Dec 12 '19 at 02:46
Its not for larger heaps, by any means. Its one of the common misconceptions — Eugene, Dec 12 '19 at 02:57
@Holger In fact, i have seen this report after suffer this same problem with argument `-XX:G1HeapRegionSize=16m`, in that condition will print a lot of humongous allocations in gc logs, it will cause mixed gc frequently, so i changed region size to 32m, and G1HeapWastePercent to 15%. Unfortunately, it still happens.As it stands right now i'm doubt that: i'm used OpenJDK 1.8, will there exist [this bug](https://bugs.openjdk.java.net/browse/JDK-8140597) ? — a.fan, Dec 12 '19 at 03:49
@Eugene I knows very little about Shenandoah, but our team used it for HBase 50G+ heap(maybe 90), So I have this impression. — a.fan, Dec 12 '19 at 03:57
There report says, it’s fixed in Java 9, and there’s no linked backport issue, so I assume, it’s present in your installation. — Holger, Dec 12 '19 at 07:55

Why does Full GC happen before the memory is exhausted?

0 Answers0