Adobe Experience Manager (AEM), Java garbage collection tuning and memory management

Question

I am currently using the Adobe Experience Manager for a Client's site (Java language). It uses openJDK:

#java -version

java version "1.7.0_65"
OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

It is running on Rackspace with the following:

vCPU: 4
Memory: 16GB
Guest OS: Red Hat Enterprise Linux 6 (64-bit)

Since it has been in production I have been experiencing very slow performance on the part of the application. It goes like this I launch the app, everything is smooth then 3 to 4 days later the CPU usage spikes to 400% (~4000 users/day hit the site). I got a few OOM exceptions (1 or 2) but mostly the site was exceptionally slow and never becomes an OOM exception. Since I am a novice at Java Memory management I started reading about how it works and found tools like jstat. When the system was overwhelmed the second time around, I ran:

#top

Got the PID of the java process and then pressed shift+H and noted the PIDs of the threads with high CPU percentage. Then I ran

#sudo -uaem jstat <PID>

Got a thread dump and converted the thread PIDs I wrote down previously and searched for their hex value in the dump. After all that, I finally found that it was not surprisingly the Garbage Collector that is flipping out for some reason.

I started reading a lot about Java GC tuning and came up with the following java options. So restarted the application with the following options:

java

-Dcom.day.crx.persistence.tar.IndexMergeDelay=0 
-Djackrabbit.maxQueuedEvents=1000000 
-Djava.io.tmpdir=/srv/aem/tmp/

-XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/srv/aem/tmp/ 
-Xms8192m -Xmx8192m 
-XX:PermSize=256m 
-XX:MaxPermSize=1024m 
-XX:+UseParallelGC 
-XX:+UseParallelOldGC 
-XX:ParallelGCThreads=4 
-XX:NewRatio=1

-Djava.awt.headless=true 
-server 
-Dsling.run.modes=publish 
-jar crx-quickstart/app/cq-quickstart-6.0.0-standalone.jar start 
-c crx-quickstart -i launchpad -p 4503 
-Dsling.properties=conf/sling.properties

And it looks like it is performing much better but I think that it probably needs more GC tuning.

When I run:

#sudo -uaem jstat <PID> -gcutils

I get this:

S0     S1     E      O       P      YGC   YGCT     FGC  FGCT      GCT   
0.00   0.00   55.97  100.00  45.09  4725  521.233  505  4179.584  4700.817

after 4 days that I restarted it.

When I run:

#sudo -uaem jstat <PID> -gccapacity

I get this:

NGCMN       NGCMX       NGC         S0C         S1C         EC                  
4194304.0   4194304.0   4194304.0   272896.0    279040.0    3636224.0   

OGCMN   OGCMX       OGC         OC          PGCMN       PGCMX  
4194304.0   4194304.0   4194304.0   4194304.0   262144.0    1048576.0

PGC         PC          YGC     FGC
262144.0    262144.0    4725    509

after 4 days that I restarted it.

These result are much better than when I started but I think it can get even better. I'm not really sure what to do next as I'm no GC pro so I was wondering if you guys would have any tips or advice for me on how I could get better app/GC performance and if anything is obvious like ratio's and sizes of youngGen and oldGen ?

How should I set the survivors and eden sizes/ratios ? Should I change GC type like use CMS GC or G1 ? How should I proceed ?

Any advice would be helpful.

Best,

Nicola

score 0 · Answer 1 · answered Apr 21 '15 at 09:01

Young and Old area ratio are interms 1:3 but it could varies depends on the application usage on short lived objects and long lived objects. If the short lived objects are more then the young space could be extended for example 2:3 (young:old). Reason for increase in the ratio is to avoid scavange garbage cycle. When more short lived objects are allocated then the young space fill fast and lead to scavenge GC cycle inturn affects the application performance. When the ratio increased then the current value then there are possibilities in the reduction of scavenge GC cycle. When the young space increased automatically survivor and Eden space increase accordingly. CMS policy used to reduce pause time of the application and G1 policy targeted for larger memories with high throughput. Gc policy can be changed based on the need of the application.

Recommended Use Cases for G1 :

The first focus of G1 is to provide a solution for users running applications that require large heaps with limited GC latency. This means heap sizes of around 6GB or larger, and stable and predictable pause time below 0.5 seconds. As you use 8G heap size, you can test with G1 gc policy for the same environment in order to check the GC performance.

Thank you Mohan this is very helpful. So trying 1:3 (young:old) with G1 and with 8MB could be something to try ? — nabello, Apr 22 '15 at 15:13
I'm still gathering information and haven't yet solved this. Since it is a production environment I haven't been able to test your suggestions. — nabello, Apr 29 '15 at 18:50

Adobe Experience Manager (AEM), Java garbage collection tuning and memory management

1 Answers1