-1

Recently came across and interesting scenario with Cloudera Hadoop and HDFS where we were unable to start our NameNode Service.

When attempting a restart of HDFS Services we were unable successfully restart NameNode Service in our cluster. Upon review of the logs, we did not observe any ERRORs but did see a few entries related to JvmPauseMonitor...

org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 5015ms

We were observing these entries in the /var/log/hadoop-hdfs/NAMENODE.log.out and were not seeing any other errors including /var/log/messages.

mnille
  • 1,328
  • 4
  • 16
  • 20
KnownTraveler
  • 349
  • 2
  • 12

1 Answers1

1

CHECK YOUR JAVA HEAP SIZES

Ultimately, we were able to determine that we were running into a Java OOM Exception that wasn't being logged.

From a performance perspective as a general rule for every 1 Million Blocks in HDFS you should have configured at least 1GB of Java Heap Size.

In our case, the resolution was as simple as increasing the Java Heap Size for the NameNode and Secondary NameNode Services and Restarting... as we had grown to 1.5 Million Blocks but were only using the default 1GB setting for the java heap size.

After increasing the Java Heap Size to at least 2GB and restarting the HDFS Services we were green across the board.

Cheers!

KnownTraveler
  • 349
  • 2
  • 12