0

I am trying to show all the services using the Jps command, but when i hit the console the below nodes are only showing

 3633 SecondaryNameNode
 4228 Jps
 3493 DataNode
 4198 NodeManager
 4088 ResourceManager

I am trying to start all services using start-dfs.sh and start-yarn.sh.But after that also the result is same.I went into the logs to find the exception,i saw below exception .

 2018-06-29 16:02:31,414 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50070
 2018-06-29 16:02:31,414 WARN org.apache.hadoop.http.HttpServer2: HttpServer Acceptor: isRunning is false. Rechecking.
 2018-06-29 16:02:31,416 WARN org.apache.hadoop.http.HttpServer2: HttpServer Acceptor: isRunning is false
 2018-06-29 16:02:31,423 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
 2018-06-29 16:02:31,425 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
 2018-06-29 16:02:31,425 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
 2018-06-29 16:02:31,425 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
  java.io.IOException: Failed to load an FSImage file!
  at      org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:673)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1006)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:736)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:531)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:587)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:754)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:738)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1427)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1493)
2018-06-29 16:02:31,428 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2018-06-29 16:02:31,454 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/

I have no clue to solve this , please help.I am using hadoop-2.5.0-cdh5.3.2.

Mandrek
  • 1,159
  • 6
  • 25
  • 55

3 Answers3

0

Follow these steps:

  1. Check the path to your FSImage, i.e, where the Namenode is storing the FSImage. In my case it is /hadoop/hdfs/namenode/current

  2. Check the last create FSImage in Namenode and Secondary Namenode. Find the latest FSImage available.

  3. Copy the latest FSImage from Secondary Namenode to Namenode with the same permissions it had in Secondary Namenode. By default, it is hdfs:hadoop in my case

  4. After copying, try restarting all the services.

Abhinav
  • 658
  • 1
  • 9
  • 27
0
  1. Format the namenode: "hdfs namenode -format"

  2. Now, ensure the clusterID= of namenode and datanode as same. If not,replace with one another.
    In my case, /Path_installation_dir/hdata/dfs/name/current/VERSION /Path_installation_dir/hdata/dfs/data/current/VERSION

  3. All done. start dfs, yarn.

0

In my case, I had 2 namenodes running and after a server reboot data got corrupted. I was getting "Failed to load image from FSImageFile" in the logs. In my case, namenode-0 was still healthy and namenode-1 was having the problem I proceeded as follows:

  1. scale down namenode to 1: leave only namenode-0
  2. delete namenode-1 PVC
  3. make sure the volume is not there with kubectl get pvc -n hadoop
  4. scale namenode back to 2

namenode-0 took care of Data Corruption and made it available to namenode-1

vale ale
  • 1
  • 1
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 02 '22 at 07:08