Namenode not starting without formatting

Question

I have a hadoop cluster setup manually in high availability with one primary namenode,one standby namenode and one datanode. I formatted the namenode in the initial startup process, but if all the servers shut down due to electricity outage, I have to restart all the services again such as journal node,zookeeper,namenode,datanode and failover daemons.

However, when I start failover daemons(dfszkfailover) on both the namenodes, the namenode stops on both of them. For it to start, I have to format the namenode(on primary namenode) and delete the temp files.

This is my local setup and have to integrate on production as well. I cannot keep formatting namenode in production environment as I will lose all the data.

I read some of the answers explaining to point the namenode and datanode directory(in hdfs-site.xml) to some specific folder other than /tmp. I am already pointing it in my home directory.

Please suggest some method so that namenode and failover daemons both start without formatting the namenode.

EDIT-- I am sharing the last 100 lines of my namenode nn1 logs


192.168.12.12:8485: Journal Storage Directory /tmp/hadoop/dfs/journalnode/ha-cluster not formatted
    at org.apache.hadoop.hdfs.qjournal.server.Journal.checkFormatted(Journal.java:479)
    at org.apache.hadoop.hdfs.qjournal.server.Journal.getLastPromisedEpoch(Journal.java:247)
    at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getJournalState(JournalNodeRpcServer.java:139)
    at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getJournalState(QJournalProtocolServerSideTranslatorPB.java:118)
    at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25415)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)

    at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
    at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:286)
    at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
    at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createNewUniqueEpoch(QuorumJournalManager.java:180)
    at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnfinalizedSegments(QuorumJournalManager.java:438)
    at org.apache.hadoop.hdfs.server.namenode.JournalSet$8.apply(JournalSet.java:624)
    at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
    at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:621)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1521)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1180)
    at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1919)
    at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
    at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
    at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1777)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1649)
    at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
    at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
2019-12-06 10:57:57,541 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [192.168.12.10:8485, 192.168.12.11:8485, 192.168.12.12:8485], stream=null))
2019-12-06 10:57:57,546 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nn1/192.168.12.10
************************************************************/

Can you please post the logs here of the namenode? Also, do check the cluster ID of each of the nodes. All should be in sync. Cluster ID can be found in VERSION file under namenode/datanode then current directory configured in hdfs-site.xml. — Shash, Nov 29 '19 at 13:52
I have added the logs. Also I checked the version of cluser id. It is same on all three nodes — Priyanka Varshney, Dec 06 '19 at 07:46
Few things which you can do - 1. Check if the directory "/tmp/hadoop/dfs/journalnode/ha-cluster" is in sync with the one configured in the Hadoop config files. 2. Try setting up this directory in any other path apart from /tmp. 3. Type "jps" before you restart the services. Just to check if there are any Zombie processes running. — Shash, Dec 09 '19 at 15:05
I have not mentioned any path like "/tmp/hadoop/dfs/journalnode/ha-cluster". In my core-site.xml I have added property -dfs.journalnode.edits.dir and in this I have added a path pointing to my home directory. Do I have to mention dfs.journalnode.edits.dir in hdfs-site.xml or any other property in hdfs-site.xml or core-site.xml? — Priyanka Varshney, Dec 11 '19 at 05:42

Namenode not starting without formatting

0 Answers0