We recently made a change to our application's persistence mechanism, replacing a slow serialization via reflection to create strings of SQL statements that were executed against a MySQL database with serialization via OpenHFT Chronicle, with hard-coded appends of the various object's fields to the excerpt. As expected this decreased the amount of time it took to persist the transaction substantially and reduced our latency. But we noticed something puzzling about the garbage collection pattern before and after the change. While the we were generating less garbage, the amount of time each garbage collection took went way up, and we're having trouble seeing why.
Here's an example of two runs, before and after the change, with the same VM args running with Java 1.8.0_74 on RedHat 4.4.7CentOS 6.7. The "After" run appears to consume less young generation, and less total at the start of each GC, approximately the same heap at the end of the GC, but takes much more time to actually run the GC, and to run much less frequently (about 300s between GCs instead of 70-80s).
We're trying to figure out how this is happening, and whether there's anything we can do to get the GC pauses back down to where they were when we we generating garbage a lot faster. Is there something about the Chronicle and the way it uses off-heap memory-mapped files that causes this? It seems unlikely, but it seems to be implicated unless it really is just something about the pattern of how the garbage is being generated is causing it. FWIW, as an experiment we tried both G1 and CMS and, at least with default params, they show the same pattern and if anything worse times.
VM args:
-Xms75776M
-Xmx75776M
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Dsun.rmi.dgc.server.gcInterval=54000000
-Dsun.rmi.dgc.client.gcinterval=54000000
-XX:+DisableExplicitGC
-XX:MaxNewSize=19000M
-XX:+PrintGCApplicationStoppedTime
-XX:InitialSurvivorRatio=3
Before:
15009.601: [GC (Allocation Failure) [PSYoungGen: 19226640K->142320K(19273216K)] 22796282K->3814635K(77411840K), 0.0596127 secs] [Times: user=0.53 sys=0.05, real=0.06 secs]
15080.410: [GC (Allocation Failure) [PSYoungGen: 19235824K->137920K(19282432K)] 22908139K->3913367K(77421056K), 0.0570436 secs] [Times: user=0.50 sys=0.06, real=0.06 secs]
15154.485: [GC (Allocation Failure) [PSYoungGen: 19243712K->129840K(19279360K)] 23019159K->4007451K(77417984K), 0.0605860 secs] [Times: user=0.55 sys=0.05, real=0.06 secs]
.
.
bunch of lines omitted to get to approximately the same end-point
.
16070.663: [GC (Allocation Failure) [PSYoungGen: 19195664K->146464K(19262976K)] 24408408K->5460932K(77401600K), 0.0616580 secs] [Times: user=0.55 sys=0.06, real=0.06 secs]
After:
15313.256: [GC (Allocation Failure) [PSYoungGen: 15092194K->2230066K(15050752K)] 16440428K->3981360K(73189376K), 0.3563350 secs] [Times: user=3.40 sys=0.15, real=0.36 secs]
15770.413: [GC (Allocation Failure) [PSYoungGen: 15050546K->2922738K(15321600K)] 16801840K->4674033K(73460224K), 0.3584541 secs] [Times: user=3.56 sys=0.00, real=0.36 secs]
16090.016: [GC (Allocation Failure) [PSYoungGen: 14896370K->3571346K(15244288K)] 16647665K->5322641K(73382912K), 0.4276084 secs] [Times: user=4.25 sys=0.00, real=0.42 secs]
Update: So I performed the following experiment based on some of the suggestions below (this is on a UAT machine, not the PROD machine, so it's a little different... e.g. the linux version is CentOS 7.2 instead of 6.7). We limit the size of the heap to 32G and set an explicit GC pause goal of 200ms, as well as turning on more diagnostic info. If anything the results are even worse. It starts out quite reasonable but blows right by the suggested pause time up to 1.94 seconds.
java -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -Dsun.rmi.dgc.server.gcInterval=54000000 -Dsun.rmi.dgc.client.gcinterval=54000000 -XX:+DisableExplicitGC -Xms32000M -Xmx32000M
{Heap before GC invocations=1 (full 0):
PSYoungGen total 9557504K, used 4096257K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
eden space 8192512K, 50% used [0x00007f58fe580000,0x00007f59f85c0458,0x00007f5af2600000)
from space 1364992K, 0% used [0x00007f5b45b00000,0x00007f5b45b00000,0x00007f5b99000000)
to space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af2600000,0x00007f5b45b00000)
ParOldGen total 21845504K, used 0K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9000000,0x00007f58fe580000)
Metaspace used 20877K, capacity 21118K, committed 21296K, reserved 1069056K
class space used 2499K, capacity 2599K, committed 2608K, reserved 1048576K
2.420: [GC (Metadata GC Threshold)
Desired survivor size 1397751808 bytes, new threshold 7 (max 15)
[PSYoungGen: 4096257K->13270K(9557504K)] 4096257K->13366K(31403008K), 0.0181755 secs] [Times: user=0.11 sys=0.05, real=0.02 secs]
Heap after GC invocations=1 (full 0):
PSYoungGen total 9557504K, used 13270K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
eden space 8192512K, 0% used [0x00007f58fe580000,0x00007f58fe580000,0x00007f5af2600000)
from space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af32f59a8,0x00007f5b45b00000)
to space 1364992K, 0% used [0x00007f5b45b00000,0x00007f5b45b00000,0x00007f5b99000000)
ParOldGen total 21845504K, used 96K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9018010,0x00007f58fe580000)
Metaspace used 20877K, capacity 21118K, committed 21296K, reserved 1069056K
class space used 2499K, capacity 2599K, committed 2608K, reserved 1048576K
}
{Heap before GC invocations=2 (full 1):
PSYoungGen total 9557504K, used 13270K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
eden space 8192512K, 0% used [0x00007f58fe580000,0x00007f58fe580000,0x00007f5af2600000)
from space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af32f59a8,0x00007f5b45b00000)
to space 1364992K, 0% used [0x00007f5b45b00000,0x00007f5b45b00000,0x00007f5b99000000)
ParOldGen total 21845504K, used 96K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9018010,0x00007f58fe580000)
Metaspace used 20877K, capacity 21118K, committed 21296K, reserved 1069056K
class space used 2499K, capacity 2599K, committed 2608K, reserved 1048576K
2.438: [Full GC (Metadata GC Threshold) [PSYoungGen: 13270K->0K(9557504K)] [ParOldGen: 96K->12747K(21845504K)] 13366K->12747K(31403008K), [Metaspace: 20877K->20877K(1069056K)], 0.0530698 secs] [Times: user=0.27 sys=0.05, real=0.05 secs]
Heap after GC invocations=2 (full 1):
PSYoungGen total 9557504K, used 0K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
eden space 8192512K, 0% used [0x00007f58fe580000,0x00007f58fe580000,0x00007f5af2600000)
from space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af2600000,0x00007f5b45b00000)
to space 1364992K, 0% used [0x00007f5b45b00000,0x00007f5b45b00000,0x00007f5b99000000)
ParOldGen total 21845504K, used 12747K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9c72e90,0x00007f58fe580000)
Metaspace used 20877K, capacity 21118K, committed 21296K, reserved 1069056K
class space used 2499K, capacity 2599K, committed 2608K, reserved 1048576K
}
{Heap before GC invocations=3 (full 1):
PSYoungGen total 9557504K, used 8192512K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
eden space 8192512K, 100% used [0x00007f58fe580000,0x00007f5af2600000,0x00007f5af2600000)
from space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af2600000,0x00007f5b45b00000)
to space 1364992K, 0% used [0x00007f5b45b00000,0x00007f5b45b00000,0x00007f5b99000000)
ParOldGen total 21845504K, used 12747K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9c72e90,0x00007f58fe580000)
Metaspace used 22891K, capacity 23188K, committed 23472K, reserved 1071104K
class space used 2602K, capacity 2688K, committed 2736K, reserved 1048576K
1332.241: [GC (Allocation Failure)
Desired survivor size 1397751808 bytes, new threshold 7 (max 15)
[PSYoungGen: 8192512K->291885K(9557504K)] 8205259K->304649K(31403008K), 0.2443282 secs] [Times: user=0.48 sys=3.83, real=0.25 secs]
Heap after GC invocations=3 (full 1):
PSYoungGen total 9557504K, used 291885K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
eden space 8192512K, 0% used [0x00007f58fe580000,0x00007f58fe580000,0x00007f5af2600000)
from space 1364992K, 21% used [0x00007f5b45b00000,0x00007f5b5780b650,0x00007f5b99000000)
to space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af2600000,0x00007f5b45b00000)
ParOldGen total 21845504K, used 12763K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9c76e90,0x00007f58fe580000)
Metaspace used 22891K, capacity 23188K, committed 23472K, reserved 1071104K
class space used 2602K, capacity 2688K, committed 2736K, reserved 1048576K
}
{Heap before GC invocations=4 (full 1):
PSYoungGen total 9557504K, used 8484397K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
eden space 8192512K, 100% used [0x00007f58fe580000,0x00007f5af2600000,0x00007f5af2600000)
from space 1364992K, 21% used [0x00007f5b45b00000,0x00007f5b5780b650,0x00007f5b99000000)
to space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af2600000,0x00007f5b45b00000)
ParOldGen total 21845504K, used 12763K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9c76e90,0x00007f58fe580000)
Metaspace used 22939K, capacity 23252K, committed 23472K, reserved 1071104K
class space used 2602K, capacity 2688K, committed 2736K, reserved 1048576K
2858.320: [GC (Allocation Failure)
Desired survivor size 1397751808 bytes, new threshold 7 (max 15)
[PSYoungGen: 8484397K->677978K(9557504K)] 8497161K->690750K(31403008K), 0.5290410 secs] [Times: user=0.74 sys=8.70, real=0.53 secs]
Heap after GC invocations=4 (full 1):
PSYoungGen total 9557504K, used 677978K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
eden space 8192512K, 0% used [0x00007f58fe580000,0x00007f58fe580000,0x00007f5af2600000)
from space 1364992K, 49% used [0x00007f5af2600000,0x00007f5b1bc16a80,0x00007f5b45b00000)
to space 1364992K, 0% used [0x00007f5b45b00000,0x00007f5b45b00000,0x00007f5b99000000)
ParOldGen total 21845504K, used 12771K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9c78e90,0x00007f58fe580000)
Metaspace used 22939K, capacity 23252K, committed 23472K, reserved 1071104K
class space used 2602K, capacity 2688K, committed 2736K, reserved 1048576K
}
{Heap before GC invocations=5 (full 1):
PSYoungGen total 9557504K, used 8870490K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
eden space 8192512K, 100% used [0x00007f58fe580000,0x00007f5af2600000,0x00007f5af2600000)
from space 1364992K, 49% used [0x00007f5af2600000,0x00007f5b1bc16a80,0x00007f5b45b00000)
to space 1364992K, 0% used [0x00007f5b45b00000,0x00007f5b45b00000,0x00007f5b99000000)
ParOldGen total 21845504K, used 12771K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9c78e90,0x00007f58fe580000)
Metaspace used 22946K, capacity 23252K, committed 23472K, reserved 1071104K
class space used 2602K, capacity 2688K, committed 2736K, reserved 1048576K
4467.385: [GC (Allocation Failure)
Desired survivor size 1397751808 bytes, new threshold 7 (max 15)
[PSYoungGen: 8870490K->1096779K(9557504K)] 8883262K->1109558K(31403008K), 0.9701847 secs] [Times: user=1.96 sys=15.40, real=0.97 secs]
Heap after GC invocations=5 (full 1):
PSYoungGen total 9557504K, used 1096779K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
eden space 8192512K, 0% used [0x00007f58fe580000,0x00007f58fe580000,0x00007f5af2600000)
from space 1364992K, 80% used [0x00007f5b45b00000,0x00007f5b88a12d00,0x00007f5b99000000)
to space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af2600000,0x00007f5b45b00000)
ParOldGen total 21845504K, used 12779K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9c7ae90,0x00007f58fe580000)
Metaspace used 22946K, capacity 23252K, committed 23472K, reserved 1071104K
class space used 2602K, capacity 2688K, committed 2736K, reserved 1048576K
}
{Heap before GC invocations=6 (full 1):
PSYoungGen total 9557504K, used 9289291K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
eden space 8192512K, 100% used [0x00007f58fe580000,0x00007f5af2600000,0x00007f5af2600000)
from space 1364992K, 80% used [0x00007f5b45b00000,0x00007f5b88a12d00,0x00007f5b99000000)
to space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af2600000,0x00007f5b45b00000)
ParOldGen total 21845504K, used 12779K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9c7ae90,0x00007f58fe580000)
Metaspace used 22951K, capacity 23252K, committed 23472K, reserved 1071104K
class space used 2602K, capacity 2688K, committed 2736K, reserved 1048576K
6153.256: [GC (Allocation Failure)
Desired survivor size 2375548928 bytes, new threshold 6 (max 15)
[PSYoungGen: 9289291K->1364987K(9557504K)] 9302070K->2238887K(31403008K), 1.9590450 secs] [Times: user=5.38 sys=29.68, real=1.96 secs]
Heap after GC invocations=6 (full 1):
PSYoungGen total 9557504K, used 1364987K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
eden space 8192512K, 0% used [0x00007f58fe580000,0x00007f58fe580000,0x00007f5af2600000)
from space 1364992K, 99% used [0x00007f5af2600000,0x00007f5b45afed70,0x00007f5b45b00000)
to space 1364992K, 0% used [0x00007f5b45b00000,0x00007f5b45b00000,0x00007f5b99000000)
ParOldGen total 21845504K, used 873899K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
object space 21845504K, 4% used [0x00007f53c9000000,0x00007f53fe56ae90,0x00007f58fe580000)
Metaspace used 22951K, capacity 23252K, committed 23472K, reserved 1071104K
class space used 2602K, capacity 2688K, committed 2736K, reserved 1048576K
}