Proper G1 GC tuning for sparc T4 8-core

Question

my application is deployed on a weblogic running on Solaris, on a dual SPARC T4 8-core 3.0 GHz. This weblogic instance is using g1 gc and I think it's possible to improve the current configuration:

GC_OPTIONS=" -server -XX:ConcGCThreads=4 -XX:+UseG1GC -XX:+DisableExplicitGC 
             -XX:MaxGCPauseMillis=300 -XX:GCPauseIntervalMillis=1000 -XX:+UseNUMA
             -XX:+UseCompressedOops -XX:ReservedCodeCacheSize=128m 
             -Dweblogic.ThreadPoolPercentSocketReader=50 -Dweblogic.ThreadPoolSize=100 
             -Dweblogic.SelfTunningThreadPoolSizeMin=100 "

It strikes me that ConcGCThreads has been set without also establishing a value for ParallelGCThreads. I think it would make a good start to make these two values display a sensible proportion. Oracle's doc says:

-XX:ParallelGCThreads=n

Sets the value of the STW worker threads. Sets the value of n to the number of logical processors. The value of n is the same as the number of logical processors up to a value of 8.

If there are more than eight logical processors, sets the value of n to approximately 5/8 of the logical processors. This works in most cases except for larger SPARC systems where the value of n can be approximately 5/16 of the logical processors.

There is no clear statement as to what a "logical processor" is. I've searched the web and it looks like it can be understood as a thread running in a physical processor or core. The amount of "logical processors" in the rack this wl is running on would thus amount to 128 (2 8-core processors "with the ability to switch between 8 threads", according to http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/o11-090-sparc-t4-arch-496245.pdf).

But I'm told that 64 out of these 128 "logical processors" are reserved for the database, the remaining ones being shared for running Tuxedo and weblogic servers. Assuming that there are two weblogic instances and that it's safe to consider that the tuxedo and the wl instances consume the same number of threds, it could be argued that (64/3)*(5/16) ~= 6 or 7 ParallelGCThreads and thus 1 or at most 2 ConcGCThreads are acceptable.

Do you think these are reasonable values to start the tuning? Any suggestion is welcome.

Edit: as of today I have some log generated with GCDetails enabled.Here's how it looks in gc viewer

g1 garbage collection logging

My interpretation:

heap usage slowly grows up as users do their tasks
tenured heap usage (the magenta line under the blue zigzags which stand for overall used heap) also does, although there is still a fair amount of space available in the tenured generation
quite on the contrary the margin of the young generation heap is quite scarce and it needs to be steadily garbage collected
although there is nothing immediately disquieting about this picture the trend is upwards. Moreover: the gc pause times (slightly more than 1s if no initial mark is involved, almost 2s otherwise) are far longer than the target goal of 300ms

Just a display of the garbage collection log:

2014-01-16T11:18:12.728+0100: 50293.457: [GC pause (young), 1.36713100 secs]
   [Parallel Time: 1308.6 ms]
      [GC Worker Start (ms):  50293458.0  50293458.0  50293458.0  50293458.1  50293458.1  50293458.1  50293458.2  50293458.2
       Avg: 50293458.1, Min: 50293458.0, Max: 50293458.2, Diff:   0.2]
      [Ext Root Scanning (ms):  982.5  174.5  146.2  150.6  170.6  139.6  154.5  158.8
       Avg: 259.7, Min: 139.6, Max: 982.5, Diff: 842.9]
      [Update RS (ms):  0.0  16.9  36.2  42.3  18.6  54.6  38.8  34.9
       Avg:  30.3, Min:   0.0, Max:  54.6, Diff:  54.6]
         [Processed Buffers : 0 15 21 31 18 27 11 21
          Sum: 144, Avg: 18, Min: 0, Max: 31, Diff: 31]
      [Scan RS (ms):  0.2  9.8  9.7  8.7  10.0  10.0  8.1  9.0
       Avg:   8.2, Min:   0.2, Max:  10.0, Diff:   9.8]
      [Object Copy (ms):  275.1  132.6  142.2  131.8  133.9  129.4  131.9  130.5
       Avg: 150.9, Min: 129.4, Max: 275.1, Diff: 145.6]
      [Termination (ms):  0.0  924.0  923.4  924.2  924.5  924.0  924.3  924.5
       Avg: 808.6, Min:   0.0, Max: 924.5, Diff: 924.5]
         [Termination Attempts : 1 1049 1140 1011 881 979 894 780
          Sum: 6735, Avg: 841, Min: 1, Max: 1140, Diff: 1139]
      [GC Worker End (ms):  50294715.8  50294715.9  50294716.0  50294715.9  50294715.9  50294715.9  50294715.9  50294715.9
       Avg: 50294715.9, Min: 50294715.8, Max: 50294716.0, Diff:   0.1]
      [GC Worker (ms):  1257.9  1257.9  1257.9  1257.9  1257.7  1257.8  1257.7  1257.7
       Avg: 1257.8, Min: 1257.7, Max: 1257.9, Diff:   0.3]
      [GC Worker Other (ms):  50.8  50.8  50.7  50.8  50.9  50.9  50.9  50.9
       Avg:  50.8, Min:  50.7, Max:  50.9, Diff:   0.2]
   [Clear CT:   1.1 ms]
   [Other:  57.4 ms]
      [Choose CSet:   0.1 ms]
      [Ref Proc:  49.8 ms]
      [Ref Enq:   0.1 ms]
      [Free CSet:   5.9 ms]
   [Eden: 1322M(1322M)->0B(1312M) Survivors: 68M->78M Heap: 4494M(6952M)->3182M(6952M)]
 [Times: user=9.12 sys=0.18, real=1.37 secs]

There are no evacuation failures, humongous object allocations or full garbage collection ocurrences to be seen... so far. There comes a point in which there is no other choice but to induce a full gc if the server is to be held up.

There are 8 parallel workers; since ConcGCThreads are set to 4 I think I could either set ParallelGCThreads=16 or reduce the ConcGCThreads to 2. Not sure what option is better, probably the former is. But it might prove not so important after all.

The reference processing times are consistently high. The famous Beckwith article mentions that:

If you see high times during reference processing then please turn on parallel reference processing by enabling the following option on the command line -XX:+ParallelRefProcEnabled.

This is something I definitely think I should do and surely will.

The main problem, however, is how to reduce the length of the gc pauses. A higher ParallelGCThreads could help, but perhaps the issue has something to do with a too ambitious pause time; as the Oracle tutorial puts it:

Instead of using average response time (ART) as a metric to set the XX:MaxGCPauseMillis=, consider setting value that will meet the goal 90% of the time or more. This means 90% of users making a request will not experience a response time higher than the goal. Remember, the pause time is a goal and is not guaranteed to always be met.

So I think it could also help to set up a more realistic MaxGCPauseMillis, say 600ms. Should such a goal be accomplished, most users would be perfectly happy. Should pause times climb over 2s they would probably not. Do you think this parameter is a candidate for further tuning or have any other suggestion?

Regards

Aleš · Answer 1 · 2014-01-15T21:21:25.460

1

The key G1 tuning flags are :

–XX:G1MixedGCLiveThresholdPercent: The occupancy threshold of live objects in the old region to be included in the mixed collection.
–XX:G1HeapWastePercent: The threshold of garbage that you can tolerate in the heap.
–XX:G1MixedGCCountTarget: The target number of mixed garbage collections within which the regions with at most G1MixedGCLiveThresholdPercent live data should be collected.
–XX:G1OldCSetRegionThresholdPercent: A limit on the max number of old regions that can be collected during a mixed collection.

Your command line options should also contain GC logging : –XX:+PrintGCDetails –XX:+PrintGCTimeStamps.

Considering -XX:ParallelGCThreads, you could try e.g. half or the total "processors" - 64 in your case, and go from there. Also, you need to consider that your PoolSize=100, so setting ParallelGCThreads=28 or less could be needed in case you need to keep your pool busy.

Considering the G1 tuning options, please see the following resources:

Javaone 2013 G1 Tuning [talk], and JavaOne 2012 session : G1 Garbage Collector Performance Tuning [youtube], [PDF] - in the last part of the talk, they present some G1 scenarios and possible tuning options applied.
G1: One Garbage Collector To Rule Them All
See G1 Commandline Options and Best practices in Getting Started with the G1 Garbage Collector.

edited Jan 15 '14 at 21:21

answered Jan 15 '14 at 20:58

Aleš

8,896
8
62
107

Thanks for your suggestions. As for the 4 key tuning flags mentioned by the oracle tutorial, I have not many clues as to whether it pays off to try to tune them or what values to start with. I've noticed that roughly only 1 in 5 or 6 collections is mixed, and I guess it could be good to try to have as many mixed collections as possible, but neither do I know for sure whether it can be accomplished by setting G1MixedGCLiveThresholdPercent and G1MixedGCCountTarget, nor do I have the slightest idea what values would be acceptable. – César Varona Jan 16 '14 at 18:43
Please, take a look at that JavaOne 2013 talk, they give some examples of G1 tuning at the end of the talk. There is not much out there but its good for a start. Also, hotspost-gc-use@openjdk is a great mailing list and good resource. – Aleš Jan 16 '14 at 19:43
and I ahve just noticed that you have the DB running , so you will really need to have the ParallelGCThreads set to lower values as you mention. – Aleš Jan 16 '14 at 19:44
After carefully asessing my gc.log I think the key is on a one single very slow worker. I guess it must dealing with either the system dictionary or the code cache. For the latter I'm not sure whether enabling the code cache flushing would do or I'd rather decrease its size. – César Varona Jan 20 '14 at 10:27

Proper G1 GC tuning for sparc T4 8-core

1 Answers1