7

I am running a build system. We used to use CMS collector, but we started suffering under very long full GC cycles, throughput (time not doing GC) was around 90%. So I now decided to switch to G1 with the assumtion that even if I have longer overall GC time, the pauses will be shorter hence ensuring higher availability. So this idea seemed to work even better than I expeced, I was seeing no full GC for almost 3 days, throughput was 97%, overall GC performance was way better. (All screenshots and data got from GCViewer)

Normal

Until now (day 6). Today the system simply went berzerk. Old space utilized is just barely under 100%. I am seeing Full GC triggered almost every 2-3 minutes or so: Berzerk!

Old space utilization: Old space

Heap size is 20G (128G Ram total). The flags I am currently using are:

-XX:+UseG1GC
-XX:MaxPermSize=512m
-XX:MaxGCPauseMillis=800
-XX:GCPauseIntervalMillis=8000 
-XX:NewRatio=4
-XX:PermSize=256m
-XX:InitiatingHeapOccupancyPercent=35
-XX:+ParallelRefProcEnabled

plus logging flags. What I seem to be missing is -XX:+ParallelGCThreads=20 (I have 32 processors), default should be 8. I have also read from oracle that it would be suggested to have -XX:+G1NewSizePercent=4 for 20G heap, default should be 5.

I am using Java HotSpot(TM) 64-Bit Server VM 1.7.0_76, Oracle Corporation

What would you suggest? Do I have obvious mistakes? What to change? Am I do greedy by giving Java only 20G? The assumption here is that giving it too much heap would mean longer GC as there is simply more to clean (peasant logic).

PS: Application is not mine. For me its a box-product.

Erki M.
  • 5,022
  • 1
  • 48
  • 74
  • 5
    I think there is some memory leak somewhere in your sofware, that will slowly consume the available heap space, making the GC's job harder and harder as time runs. So, the solution is not to be searched in the GC algorithm or in the heap settings (which is eventually going to fill regardless of its size). You have to fix your software, or live with the fact you need to reboot it every now and then. Interestingly, it looks like your heap won't completely fill, crashing your program, so maybe I'm just wrong. – Giulio Franco Mar 06 '15 at 15:33
  • You should pastebin a GC log printed with `-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -Xloggc:`, that'll be more useful to understand G1's decisions. If you can you should try java 8, G1GC has undergone a lot of changes since then. Many of its heuristics have been improved and some bottlenecks removed. IIRC in 7 there are a few cases where G1 can sortof "paint itself into a corner" – the8472 Mar 06 '15 at 16:58
  • I found a solution, The system also allows users to execute custom scripts during build process. After (a very long) investigation, it turned out that one user was constantly executing a script, that did not release the memory, causing the heap baseline steadily go up, hence GC cycle released less and less each turn. – Erki M. Apr 14 '15 at 22:55

1 Answers1

1

What would you suggest? Do I have obvious mistakes? What to change? Am I do greedy by giving Java only 20G? The assumption here is that giving it too much heap would mean longer GC as there is simply more to clean (peasant logic).

If it triggers full GCs but your occupancy stays near those 20GB then it's possible that the GC simply does not have enough breathing room, either to meet the demand of huge allocations or or to meet some of its goals (throughput, pause times), forcing full GCs as a fallback.

So what you can attempt is increasing the heap limit or relaxing the throughput goals.

As mentioned earlier in my comment you can also try upgrading to java8 for improved G1 heuristics.

For further advice GC logs covering the "berzerk" behavior would be useful.

the8472
  • 40,999
  • 5
  • 70
  • 122