my application is deployed on a weblogic running on Solaris, on a dual SPARC T4 8-core 3.0 GHz. This weblogic instance is using g1 gc and I think it's possible to improve the current configuration:
GC_OPTIONS=" -server -XX:ConcGCThreads=4 -XX:+UseG1GC -XX:+DisableExplicitGC
-XX:MaxGCPauseMillis=300 -XX:GCPauseIntervalMillis=1000 -XX:+UseNUMA
-XX:+UseCompressedOops -XX:ReservedCodeCacheSize=128m
-Dweblogic.ThreadPoolPercentSocketReader=50 -Dweblogic.ThreadPoolSize=100
-Dweblogic.SelfTunningThreadPoolSizeMin=100 "
It strikes me that ConcGCThreads has been set without also establishing a value for ParallelGCThreads. I think it would make a good start to make these two values display a sensible proportion. Oracle's doc says:
-XX:ParallelGCThreads=n
Sets the value of the STW worker threads. Sets the value of n to the number of logical processors. The value of n is the same as the number of logical processors up to a value of 8.
If there are more than eight logical processors, sets the value of n to approximately 5/8 of the logical processors. This works in most cases except for larger SPARC systems where the value of n can be approximately 5/16 of the logical processors.
There is no clear statement as to what a "logical processor" is. I've searched the web and it looks like it can be understood as a thread running in a physical processor or core. The amount of "logical processors" in the rack this wl is running on would thus amount to 128 (2 8-core processors "with the ability to switch between 8 threads", according to http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/o11-090-sparc-t4-arch-496245.pdf).
But I'm told that 64 out of these 128 "logical processors" are reserved for the database, the remaining ones being shared for running Tuxedo and weblogic servers. Assuming that there are two weblogic instances and that it's safe to consider that the tuxedo and the wl instances consume the same number of threds, it could be argued that (64/3)*(5/16) ~= 6 or 7 ParallelGCThreads and thus 1 or at most 2 ConcGCThreads are acceptable.
Do you think these are reasonable values to start the tuning? Any suggestion is welcome.
Edit: as of today I have some log generated with GCDetails enabled.Here's how it looks in gc viewer
My interpretation:
- heap usage slowly grows up as users do their tasks
- tenured heap usage (the magenta line under the blue zigzags which stand for overall used heap) also does, although there is still a fair amount of space available in the tenured generation
- quite on the contrary the margin of the young generation heap is quite scarce and it needs to be steadily garbage collected
- although there is nothing immediately disquieting about this picture the trend is upwards. Moreover: the gc pause times (slightly more than 1s if no initial mark is involved, almost 2s otherwise) are far longer than the target goal of 300ms
Just a display of the garbage collection log:
2014-01-16T11:18:12.728+0100: 50293.457: [GC pause (young), 1.36713100 secs]
[Parallel Time: 1308.6 ms]
[GC Worker Start (ms): 50293458.0 50293458.0 50293458.0 50293458.1 50293458.1 50293458.1 50293458.2 50293458.2
Avg: 50293458.1, Min: 50293458.0, Max: 50293458.2, Diff: 0.2]
[Ext Root Scanning (ms): 982.5 174.5 146.2 150.6 170.6 139.6 154.5 158.8
Avg: 259.7, Min: 139.6, Max: 982.5, Diff: 842.9]
[Update RS (ms): 0.0 16.9 36.2 42.3 18.6 54.6 38.8 34.9
Avg: 30.3, Min: 0.0, Max: 54.6, Diff: 54.6]
[Processed Buffers : 0 15 21 31 18 27 11 21
Sum: 144, Avg: 18, Min: 0, Max: 31, Diff: 31]
[Scan RS (ms): 0.2 9.8 9.7 8.7 10.0 10.0 8.1 9.0
Avg: 8.2, Min: 0.2, Max: 10.0, Diff: 9.8]
[Object Copy (ms): 275.1 132.6 142.2 131.8 133.9 129.4 131.9 130.5
Avg: 150.9, Min: 129.4, Max: 275.1, Diff: 145.6]
[Termination (ms): 0.0 924.0 923.4 924.2 924.5 924.0 924.3 924.5
Avg: 808.6, Min: 0.0, Max: 924.5, Diff: 924.5]
[Termination Attempts : 1 1049 1140 1011 881 979 894 780
Sum: 6735, Avg: 841, Min: 1, Max: 1140, Diff: 1139]
[GC Worker End (ms): 50294715.8 50294715.9 50294716.0 50294715.9 50294715.9 50294715.9 50294715.9 50294715.9
Avg: 50294715.9, Min: 50294715.8, Max: 50294716.0, Diff: 0.1]
[GC Worker (ms): 1257.9 1257.9 1257.9 1257.9 1257.7 1257.8 1257.7 1257.7
Avg: 1257.8, Min: 1257.7, Max: 1257.9, Diff: 0.3]
[GC Worker Other (ms): 50.8 50.8 50.7 50.8 50.9 50.9 50.9 50.9
Avg: 50.8, Min: 50.7, Max: 50.9, Diff: 0.2]
[Clear CT: 1.1 ms]
[Other: 57.4 ms]
[Choose CSet: 0.1 ms]
[Ref Proc: 49.8 ms]
[Ref Enq: 0.1 ms]
[Free CSet: 5.9 ms]
[Eden: 1322M(1322M)->0B(1312M) Survivors: 68M->78M Heap: 4494M(6952M)->3182M(6952M)]
[Times: user=9.12 sys=0.18, real=1.37 secs]
There are no evacuation failures, humongous object allocations or full garbage collection ocurrences to be seen... so far. There comes a point in which there is no other choice but to induce a full gc if the server is to be held up.
There are 8 parallel workers; since ConcGCThreads are set to 4 I think I could either set ParallelGCThreads=16 or reduce the ConcGCThreads to 2. Not sure what option is better, probably the former is. But it might prove not so important after all.
The reference processing times are consistently high. The famous Beckwith article mentions that:
If you see high times during reference processing then please turn on parallel reference processing by enabling the following option on the command line -XX:+ParallelRefProcEnabled.
This is something I definitely think I should do and surely will.
The main problem, however, is how to reduce the length of the gc pauses. A higher ParallelGCThreads could help, but perhaps the issue has something to do with a too ambitious pause time; as the Oracle tutorial puts it:
Instead of using average response time (ART) as a metric to set the XX:MaxGCPauseMillis=, consider setting value that will meet the goal 90% of the time or more. This means 90% of users making a request will not experience a response time higher than the goal. Remember, the pause time is a goal and is not guaranteed to always be met.
So I think it could also help to set up a more realistic MaxGCPauseMillis, say 600ms. Should such a goal be accomplished, most users would be perfectly happy. Should pause times climb over 2s they would probably not. Do you think this parameter is a candidate for further tuning or have any other suggestion?
Regards