0

I have an instance of zookeeper that has been running for some time... (Java 1.7.0_131, ZK 3.5.1-1), with -Xmx10G -XX:+UseParallelGC.

Recently there was a leadership change, and the memory usage on most instances in the quorum went from ~200MB to 2GB+. I took a jmap dump, and what I found that was interesting was that there was a lot of byte[] serialization data (>1GB) that had no GC Root, but hadn't been collected.

(This is ByteArrayOutputStream, DataOutputStream, org.apache.jute.BinaryOutputArchive, or HeapByteBuffer, BinaryOutputArchive).

Looking at the gc log, shortly before the election change, the full GC was running every 4-5 minutes. After the election, the tenuring threshold increases from 1 to 15 (max) and the full GC runs less and less often, eventually it doesn't even run on some days.

After severals days, suddenly, and mysteriously to me, something changes, and the memory plummets back to ~200MB with Full GC running every 4-5 minutes.

What I'm confused about here, is how so much memory can have no GC Root, and not get collected by a full GC. I even tried triggering a GC.run from jcmd a few times.

I wondered if something in ZK native land was holding onto this memory, or leaking this memory... which could explain it.

I'm looking for any debugging suggestions; I'm planning on upgrading Java 1.8, maybe ZK 3.5.4, but would really like to root cause this before moving on.

So far I've used visualvm, GCviewer and Eclipse MAT.

JVM Max Memory Usage GC Log View (Solid vertical black lines are full GC. Yellow is young generation).

Alun
  • 541
  • 6
  • 16

1 Answers1

1

I am not an expert on ZK. However, I have been tuning JVMs on Weblogic for a while and I feel, based on this information, that your configuration is generating the expansion and shrinking of the heaps (-Xmx10G -XX:+UseParallelGC). Thus, perhaps you should try using -Xms10G and -Xmx10G to avoid this resizing. Importantly, each time the JVM is resized a full GC is executed so avoiding this process is a good way to minimize the number of full garbage collections.

Please read this

"When a Hotspot JVM starts, the heap, the young generation and the perm generation space are allocated to their initial sizes determined by the -Xms, -XX:NewSize, and -XX:PermSize parameters respectively, and increment as-needed to the maximum reserved size, which are -Xmx, - XX:MaxNewSize, and -XX:MaxPermSize. The JVM may also shrink the real size at runtime if the memory is not needed as much as originally specified. However, each resizing activity triggers a Full Garbage Collection (GC), and therefore impacts performance. As a best practice, we recommend that you make the initial and maximum sizes identical"

Source: http://www.oracle.com/us/products/applications/aia-11g-performance-tuning-1915233.pdf

If you could provide your gc.log, it would be useful to analyse this case thoroughly.

Best regards, RCC

rcastellcastell
  • 387
  • 1
  • 7