So, we're using G1 GC. and 18GB heap. Young generation size is about 3,5G. And heap maximum usage is about 12G. And memory is full of short-living objects.
Also, maybe it's important that Couchbase instance is running on the same node near the JVM. It takes all iops often when it's persisting changes on hdd, but there are enough free cpu time and memory.
Enabled JVM options:
-Xmx18g -Xms18g -XX:MaxPermSize=512M -XX:+UseG1GC -XX:+DisableExplicitGC -XX:+ParallelRefProcEnabled
My sad problem is young generation GC pauses. Long pause GC log usually looks like:
2013-07-10T15:06:25.963+0400: 9122,066: [GC pause (young)
Desired survivor size 243269632 bytes, new threshold 5 (max 15)
- age 1: 69789280 bytes, 69789280 total
- age 2: 58618240 bytes, 128407520 total
- age 3: 54519720 bytes, 182927240 total
- age 4: 51592728 bytes, 234519968 total
- age 5: 45186520 bytes, 279706488 total
9122,066: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 174,16 ms, remaining time: 25,84 ms, target pause time: 200,00 ms]
9122,066: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 426 regions, survivors: 34 regions, predicted young region time: 164,97 ms]
9122,066: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 426 regions, survivors: 34 regions, old: 0 regions, predicted pause time: 339,13 ms, target pause time: 200,00 ms]
9122,259: [SoftReference, 0 refs, 0,0063780 secs]9124,575: [WeakReference, 4 refs, 0,0031600 secs]9124,578: [FinalReference, 1640 refs, 0,0033730 secs]9124,581: [PhantomReference, 145 refs, 0,0032080 secs]9124,5
85: [JNI Weak Reference, 0,0000810 secs], 2,53669600 secs]
[Parallel Time: 190,5 ms]
[GC Worker Start (ms): 9122066,6 9122066,7 9122066,7 9122066,7 9122066,8 9122066,9 9122066,9 9122066,9 9122066,9 9122067,0 9122067,0 9122067,1 9122067,1 9122067,1 9122067,1 9122067,2 91220
67,2 9122067,3
Avg: 9122067,0, Min: 9122066,6, Max: 9122067,3, Diff: 0,7]
[Ext Root Scanning (ms): 4,7 6,0 4,8 4,5 4,2 4,3 4,2 4,3 4,6 3,4 13,5 5,2 4,2 5,6 4,2 4,1 4,3 4,0
Avg: 5,0, Min: 3,4, Max: 13,5, Diff: 10,1]
[Update RS (ms): 20,9 19,6 21,1 21,3 21,2 21,2 21,3 21,2 21,7 21,5 12,1 20,2 21,1 19,4 21,0 21,1 20,7 21,2
Avg: 20,4, Min: 12,1, Max: 21,7, Diff: 9,6]
[Processed Buffers : 27 23 25 29 31 22 25 34 28 14 36 23 24 22 28 24 25 24
Sum: 464, Avg: 25, Min: 14, Max: 36, Diff: 22]
[Scan RS (ms): 9,0 9,2 8,7 8,8 9,1 9,1 8,9 9,1 8,3 9,2 9,0 9,1 9,2 9,2 9,1 9,0 9,0 9,1
Avg: 9,0, Min: 8,3, Max: 9,2, Diff: 1,0]
[Object Copy (ms): 145,1 145,0 145,2 145,1 145,1 144,9 145,1 144,9 144,9 145,4 144,8 144,8 144,8 145,0 145,0 145,1 145,2 144,9
Avg: 145,0, Min: 144,8, Max: 145,4, Diff: 0,6]
[Termination (ms): 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0
Avg: 0,0, Min: 0,0, Max: 0,0, Diff: 0,0]
[Termination Attempts : 5 8 2 11 5 6 6 5 5 7 4 7 2 9 8 5 7 8
Sum: 110, Avg: 6, Min: 2, Max: 11, Diff: 9]
[GC Worker End (ms): 9122246,4 9122246,6 9122246,7 9122246,6 9122246,7 9122246,7 9122246,5 9122246,7 9122246,5 9122246,5 9122246,6 9122246,7 9122246,8 9122246,4 9122246,6 9122246,5 9122246
,7 9122246,8
Avg: 9122246,6, Min: 9122246,4, Max: 9122246,8, Diff: 0,3]
[GC Worker (ms): 179,8 179,9 180,0 179,8 179,9 179,9 179,6 179,8 179,6 179,5 179,6 179,6 179,7 179,3 179,5 179,4 179,4 179,5
Avg: 179,7, Min: 179,3, Max: 180,0, Diff: 0,7]
[GC Worker Other (ms): 10,7 10,7 10,8 10,8 10,9 10,9 11,0 11,0 11,0 11,1 11,1 11,1 11,2 11,2 11,2 11,2 11,3 11,3
Avg: 11,0, Min: 10,7, Max: 11,3, Diff: 0,6]
[Clear CT: 2,8 ms]
[Other: 2343,4 ms]
[Choose CSet: 0,1 ms]
[Ref Proc: 2327,7 ms]
[Ref Enq: 1,9 ms]
[Free CSet: 8,2 ms]
[Eden: 3408M(3408M)->0B(3400M) Survivors: 272M->280M Heap: 9998M(18432M)->6638M(18432M)]
[Times: user=3,26 sys=0,02, real=2,54 secs]
Total time for which application threads were stopped: 2,5434370 seconds
The only phase of GC which makes problems is 'Reference processing'. But the log looks strange: soft, weak, final and jni references processing took very little time. But overall time is 2,5 seconds. Or it can be even more, up to 10 seconds in worst cases.
Another pause (more comfortable) may look like:
2013-07-10T16:26:11.862+0400: 13907,965: [GC pause (young)
Desired survivor size 243269632 bytes, new threshold 4 (max 15)
- age 1: 69125832 bytes, 69125832 total
- age 2: 58756480 bytes, 127882312 total
- age 3: 52397376 bytes, 180279688 total
- age 4: 88850424 bytes, 269130112 total
13907,965: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 77,38 ms, remaining time: 122,62 ms, target pause time: 200,00 ms]
13907,965: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 427 regions, survivors: 33 regions, predicted young region time: 167,95 ms]
13907,965: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 427 regions, survivors: 33 regions, old: 0 regions, predicted pause time: 245,33 ms, target pause time: 200,00 ms]
13908,155: [SoftReference, 0 refs, 0,0041340 secs]13908,160: [WeakReference, 0 refs, 0,0023850 secs]13908,162: [FinalReference, 1393 refs, 0,0065970 secs]13908,169: [PhantomReference, 108 refs, 0,0018650 secs]13908,171: [JNI Weak Reference, 0,0000630 secs], 0,22008100 secs]
[Parallel Time: 188,4 ms]
[GC Worker Start (ms): 13907965,3 13907965,3 13907965,4 13907965,4 13907965,5 13907965,5 13907965,6 13907965,6 13907965,6 13907965,7 13907965,7 13907965,7 13907965,8 13907965,8 13907965,8 13907965,9 13907965,9 13907965,9
Avg: 13907965,6, Min: 13907965,3, Max: 13907965,9, Diff: 0,6]
[Ext Root Scanning (ms): 5,8 5,0 6,8 6,3 6,1 6,2 6,0 6,3 5,2 4,2 5,0 6,2 4,5 6,0 17,1 4,4 6,2 5,3
Avg: 6,3, Min: 4,2, Max: 17,1, Diff: 12,9]
[Update RS (ms): 24,8 26,0 23,9 24,1 24,1 24,1 24,2 23,9 25,0 25,2 25,1 24,1 26,0 24,3 13,7 25,7 24,2 24,7
Avg: 24,1, Min: 13,7, Max: 26,0, Diff: 12,2]
[Processed Buffers : 30 20 9 16 16 19 20 21 22 12 30 17 17 20 12 20 17 22
Sum: 340, Avg: 18, Min: 9, Max: 30, Diff: 21]
[Scan RS (ms): 7,5 7,1 7,2 7,5 7,6 7,5 7,5 7,6 7,1 7,4 7,6 7,2 7,2 7,4 7,2 7,5 7,0 7,7
Avg: 7,4, Min: 7,0, Max: 7,7, Diff: 0,7]
[Object Copy (ms): 133,1 133,1 133,2 133,1 133,2 133,1 133,2 133,1 133,5 134,0 133,0 133,2 133,0 132,9 132,6 133,1 133,2 132,9
Avg: 133,1, Min: 132,6, Max: 134,0, Diff: 1,3]
[Termination (ms): 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0
Avg: 0,0, Min: 0,0, Max: 0,0, Diff: 0,0]
[Termination Attempts : 1 3 1 1 1 1 1 1 1 1 2 3 2 1 1 1 1 1
Sum: 24, Avg: 1, Min: 1, Max: 3, Diff: 2]
[GC Worker End (ms): 13908136,6 13908136,9 13908136,5 13908136,7 13908136,7 13908136,8 13908136,7 13908136,7 13908136,8 13908136,8 13908136,5 13908136,6 13908136,5 13908136,5 13908136,5 13908136,5 13908136,8 13908136,6
Avg: 13908136,7, Min: 13908136,5, Max: 13908136,9, Diff: 0,4]
[GC Worker (ms): 171,3 171,6 171,1 171,2 171,2 171,3 171,1 171,1 171,1 171,2 170,8 170,9 170,7 170,7 170,7 170,6 171,0 170,7
Avg: 171,0, Min: 170,6, Max: 171,6, Diff: 0,9]
[GC Worker Other (ms): 17,2 17,2 17,3 17,3 17,4 17,4 17,5 17,5 17,5 17,5 17,6 17,6 17,7 17,7 17,7 17,7 17,8 17,8
Avg: 17,5, Min: 17,2, Max: 17,8, Diff: 0,6]
[Clear CT: 1,6 ms]
[Other: 30,1 ms]
[Choose CSet: 0,1 ms]
[Ref Proc: 17,1 ms]
[Ref Enq: 0,9 ms]
[Free CSet: 7,4 ms]
[Eden: 3416M(3416M)->0B(3456M) Survivors: 264M->224M Heap: 7289M(18432M)->3912M(18432M)]
[Times: user=3,16 sys=0,00, real=0,22 secs]
Reference processing is still longest phase, but it's much shorter. ParallelRefProcEnabled wasn't be a cure for my problem. I've also tried to change size of young gen. It also not helped. Setting different -XX:MaxGCPauseMillis, more relaxed 600ms or more strict 100, results in still bad throughput.
CMS performance is even worse than G1 with parameters:
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
Young gen collections going longer and more often.
I'm totally confused with described logs. Tomorrow I'll try to move Couchbase instance to another node in order to check if it's freezes GC threads.
But, if Couchbase is not the point, maybe someone could explain me meaning of the logs. Or maybe there are some magic CMS parameters to fix that thing.
I'll be very glad for any help!