2

I have a service running on a system with 16GB RAM with following configurations :

-Xms6144M"
-Xmx6144M
-XX:+UseG1GC
-XX:NewSize=1500M
-XX:NewSize=1800M
-XX:MaxNewSize=2100M
-XX:NewRatio=2
-XX:SurvivorRatio=12
-XX:MaxGCPauseMillis=100
-XX:MaxGCPauseMillis=1000
-XX:GCTimeRatio=9
-XX:-UseAdaptiveSizePolicy
-XX:+PrintAdaptiveSizePolicy

It has around 20 pollers running each having ThreadPoolExecutor of size 30 for processing messages. Initially for around 5-6 hours it was able to process around 130 messages per second. Thereafter it was able to process around only 40 messages per second.

I analyzed GC logs to find out that Full GC became very frequent and more than 1000MB data was getting promoted from Young to Old Generation:

YoungGen

OldGen

PromotionYoung

GCPause

Looking at the Heap Dump I see lots of thread in Waiting state similar to this : WAITING at sun.misc.Unsafe.park(Native Method) And following classes objects acquiring most retained size : HeapDump

I think there may be small Memory leak in service and its associated libraries which is getting accumulated over time so increasing Heap size will only postpone this. Or may be as the Full GC have become very frequent all other threads are getting stopped very frequently ("stop the world" pauses). Need help to figure out the root cause of this behaviour.

AkaSh
  • 486
  • 4
  • 16
  • "It has around 20 pollers running each having ThreadPoolExecutor of size 30 for processing messages." - This seems excessive. – Jacob G. Apr 30 '19 at 23:12
  • 1
    I think you are correct. This looks like a memory leak. That leads to the GC having to run more and more frequently ... until you are spending most of the time running stop-the-world GCs. GC tuning won't help. Find and fix the memory leak. – Stephen C Apr 30 '19 at 23:23
  • The leak looks like it might be to do with finalization. Investigate!! As various sources state, finalization happens after the main GC has run, and is often single threaded. If there is too much finalization to be done, it can be a bottleneck. The same thing can also occur with `Reference` objects. If this is the problem, then 1) look to see if you can enable multiple finalizer / reference queue processing threads, and 2) look to see if you can reduce your use of finalization and `Reference` objects. Both are expensive. – Stephen C Apr 30 '19 at 23:34

1 Answers1

1

GC pattern looks like memory leak.

Looking at your heap dump stats I can see 3M tasks waiting for execution in thread pools.

I can speculate, you are using thread pools with unbounded task queue. Your inbound rate of message is greater than processing capacity of system, so backlog is growing up consuming more memory eventually leading to death by GC.

Dependent on your case, you may either limit queue size for thread pool or try to optimize memory footprint of queue tasks.

Limiting queue size would create a back pressure on previous processing stage. If it is simple timer driven poller who is producer for thread pool, effect would be reducer polling interval (as poller would block waiting for room in queue).

Optimization of task memory footprint would work only is your processing capabilities in average greater than inbound task rate and problem is cause by temporary surge.

Alexey Ragozin
  • 8,081
  • 1
  • 23
  • 23