1

I am currently trying to tune Apache NiFi to get it work with a high throughput flow, but I cannot avoid Full GCs.

When the flow is started, really quick young GCs occur, but they are not able to cope with the allocation demand until eventually the full GC is triggered. This situation happens with different heap sizes (from 8GB to 50GB) and basic configuration (only region size and designated threads have been configured according to the oracle documentation):

-XX:ConcGCThreads=3
-XX:G1HeapRegionSize=16
-XX:InitialHeapSize=34359738368
-XX:MaxHeapSize=34359738368
-XX:ParallelGCThreads=10
-XX:+ParallelRefProcEnabled
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintGC
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+UnlockExperimentalVMOptions
-XX:+UseG1GC

The following is the GC dump analysis performed using GCViewer:

enter image description here

Looking at the logs, I've noticed that the count of old regions selected for mixed GCs is always zero. According to this really useful article, you can increase that dropping the InitiatingHeapOccupancyPercent, and allowing the marking phase to start earlier. This does not seem to have any effect, given that the count of old regions selected remains zero:

[G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1777 regions, survivors: 251 regions, old: 0 regions, predicted pause time: 216.28 ms, target pause time: 200.00 ms]

According to the oracle documentation, there are experimental flags I could use, like G1MixedGCLiveThresholdPercent=65, but even if I add the UnlockExperimentalVMOptions flag before everything else, I get the following error:

2016-10-10 14:48:14,285 ERROR [NiFi logging handler] org.apache.nifi.StdErr Error: VM option 'G1MixedGCLiveThresholdPercent' is experimental and must be enabled via -XX:+UnlockExperimentalVMOptions

Basically, the flag is ignored.

In first instance, could it be the the non-collection of the old regions to trigger the full GCs? If so, how can I manage to get the old regions collected and free enough memory for the application need?

Thank you all for your help.

riccamini
  • 1,161
  • 1
  • 13
  • 29
  • It could be "humongous" regions, since they are allocated directly in the old gen space. What's the nature of your application? Any really big objects (half of a G1 region size) being allocated? – bashnesnos Oct 10 '16 at 15:39
  • I've searched for **humongous** allocations in the log as explained in the guide, no entry is found. – riccamini Oct 10 '16 at 19:24
  • Could you update your question with all your JVM flags for the provided graph? – bashnesnos Oct 11 '16 at 07:24
  • Updated with all the flags as printed by the gc log – riccamini Oct 11 '16 at 08:41
  • It seems, that you have a pretty big live data set (29G still occupied after a Full GC, which would be 85% of your entire heap), which means that the objects residing in the old region are simply not eligible for collection. If you don't have any sort of caching which might consume that much it might be a memory leak. What if you reproduce this issue on a samller heap size, get heap dump and analyze what kinds of objects consume the space? – bashnesnos Oct 11 '16 at 09:59
  • Thanks I will give it a shot, I am still in the learning curve for that type of issues. – riccamini Oct 12 '16 at 14:48

0 Answers0