How to tune the CMSInitiatingOccupancyFraction in Hadoop cluster

Question

We have an old Hadoop cluster based on HDP from Hortonworks version HDP 2.6.4. The cluster includes 2 namenode services where one is the standby namenode and the second is the active namenode. All machines in the cluster are RHEL 7.2 version, and we not see any problem on OS level.

The cluster also includes 12 workers machines (workers include the datanode and node manager services, while each worker have 32 cores).

The story began when we get alerts from the smoke test script that complains about "Detected pause in JVM or host machine" on the standby namenode. Based on that we decided to increase the namenode heap size from 60G to 100G.

Above setting was based on a table that shows how much memory to set according to number of files in HDFS. According to the table we decided to set the namenode heapsize to 100G and then we restarted the HDFS service.

After HDFS is completely restarted, we still see the messages about "Detected pause in JVM or host machine", and this is really strange because we almost twice the namenode heap size.

So, we started to perform deeply testing using jstat. For example, we get from jsat low very of FGCT that is really good values and not point on namenode heap problem (1837 is the HDFS PID number)

  # /usr/jdk64/jdk1.8.0_112/bin/jstat -gcutil    1837 10 10
  S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT
  0.00   1.95  32.30  34.74  97.89      -    197  173.922     2    1.798  175.720
  0.00   1.95  32.30  34.74  97.89      -    197  173.922     2    1.798  175.720
  0.00   1.95  32.30  34.74  97.89      -    197  173.922     2    1.798  175.720
  0.00   1.95  32.30  34.74  97.89      -    197  173.922     2    1.798  175.720
  0.00   1.95  32.30  34.74  97.89      -    197  173.922     2    1.798  175.720
  0.00   1.95  32.30  34.74  97.89      -    197  173.922     2    1.798  175.720
  0.00   1.95  32.30  34.74  97.89      -    197  173.922     2    1.798  175.720
  0.00   1.95  32.30  34.74  97.89      -    197  173.922     2    1.798  175.720
  0.00   1.95  32.30  34.74  97.89      -    197  173.922     2    1.798  175.720
  0.00   1.95  32.30  34.74  97.89      -    197  173.922     2    1.798  175.720

And here is the messages from namenode logs

2022-10-27 14:04:49,728 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2044ms
2022-10-27 16:21:33,973 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2524ms
2022-10-27 17:31:35,333 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2444ms
2022-10-27 18:55:55,387 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2134ms
2022-10-27 19:42:00,816 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2153ms
2022-10-27 20:50:23,624 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2050ms
2022-10-27 21:07:01,240 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2343ms
2022-10-27 23:53:00,507 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2120ms
2022-10-28 00:43:30,633 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1811ms
2022-10-28 00:53:35,120 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2192ms
2022-10-28 02:07:39,660 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2353ms
2022-10-28 02:49:25,018 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1698ms
2022-10-28 03:00:20,592 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2432ms
2022-10-28 05:02:15,093 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2016ms
2022-10-28 06:52:46,672 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approxim

As we can see each 1-2 hours message about pause in "JVM or host machine" appears.

We checked the number of files in HDFS and number of file is 7 million files.

From documentation - https://docs.cloudera.com/HDPDocuments/Ambari-2.5.1.0/bk_ambari-operations/content/tuning_garbage_collection.html

we understand that one option to solve the issue is by tuning - CMSInitiatingOccupancyFraction

CMSInitiatingOccupancyFraction , is indeed configured to 70 in our ENV as following

export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=8 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -Xms1G -Xmx1G -XX:NewSize=128M -XX:MaxNewSize=128M -XX:PermSize=128M -XX:MaxPermSize=256M -verbose:gc -Xloggc:/Users/chris/hadoop-deploy-trunk/hadoop-3.0.0-SNAPSHOT/logs/gc.log-`date +'%Y%m%d%H%M'` -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:ErrorFile=/Users/chris/hadoop-deploy-trunk/hadoop-3.0.0-SNAPSHOT/logs/hs_err_pid%p.log -XX:+HeapDumpOnOutOfMemoryError $HADOOP_NAMENODE_OPTS"

So based on that, is it right to increase CMSInitiatingOccupancyFraction to 90?

Aadditionally, from ENV we can see that ParallelGCThreads=8, but on each machine we have 32 cores. So, can we increase ParallelGCThreads to 32 as the real number of cores? Can it help?

Consider using a different garbage collector. I believe this is entirely expected for CMS with a lot of memory. — Mark Rotteveel, Oct 30 '22 at 13:33
about CMSInitiatingOccupancyFraction and ParallelGCThreads so you are against to increae them ? — Judy, Oct 30 '22 at 13:38
What he is saying is that CMS is unlikely to work well ... *no matter what you try to tune it*. If you want better performance, you should update your cluster to a newer version of Java so that you can get the better of a more modern garbage collector which supports large heaps better. (And probably you should update the version of Hadoop too. And maybe even the RHEL version ...) — Stephen C, Oct 30 '22 at 14:56
I think you should try either G1GC or ZGC. Note that CMS is deprecated in Java 9 and removed entirely in Java 14 ... which should tell you something. — Stephen C, Oct 30 '22 at 15:02

How to tune the CMSInitiatingOccupancyFraction in Hadoop cluster

0 Answers0