I'm upgrading hadoop from version 3.0.0 to version 3.2.2. Following are the steps I followed:
- Get active namenode:
$ hdfs haadmin -getServiceState nn1
standby
$ hdfs haadmin -getServiceState nn2
active
- Turn safemode on and save namespace(commands ran on nn2):
$ hdfs dfsadmin -safemode enter
$ hdfs dfsadmin -saveNamespace
- Stop all hadoop services and upgrade binaries on all nodes
- Start zookeeper-failover-controller and journalnode on required nodes
- Start nn2(last active namenode) with
-upgrade -renameReserved
flags
$ hdfs --daemon start namenode -upgrade -renameReserved
from logs:
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = <hostname2>/<IP2>
STARTUP_MSG: args = [-upgrade, -renameReserved]
STARTUP_MSG: version = 3.2.2
org.apache.hadoop.hdfs.server.namenode.NameNode: createNameNode [-upgrade, -renameReserved]
org.apache.hadoop.hdfs.server.namenode.FSImage: Starting upgrade of local storage directories.
old LV = -64; old CTime = 1636096354752.
new LV = -65; new CTime = 1642257088033
org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Starting upgrade of storage directory
org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Performing upgrade of storage directory
org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for active state
- stop nn2 and start normally and turn safemode off(We have defined these as services, hence starting nn2 as service):
$ hdfs --daemon stop namenode
$ sudo service hadoop-hdfs-namenode start
from logs:
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = <hostname2>/<IP2>
STARTUP_MSG: args = []
STARTUP_MSG: version = 3.2.2
org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON.
The reported blocks 0 needs additional 2537 blocks to reach the threshold 0.9990 of total blocks 2540.
The minimum number of live datanodes is not required. Safe mode will be turned off automatically once the thresholds have been reached.
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for standby state
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Will roll logs on active node every 120 seconds.
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Starting standby checkpoint thread...
Checkpointing active NN to possible NNs: [http://<hostname1>:<port>]
Serving checkpoints at http://<hostname2>:<port>
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log roll on remote NameNode
org.apache.hadoop.ipc.Client: Retrying connect to server: <hostname1>/<ip1>:<port>
- Turn safemode off on active nn(nn2 in this case)
$ hdfs dfsadmin -safemode leave
safemode: Call From <hostname2>/<IP2> to <hostname1>:<port> failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
- Wait until safemode is turned off:
$ hdfs dfsadmin -safemode get
safemode: Call From <hostname2>/<IP2> to <hostname1>:<port> failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
I expect the safemode to be turned off on nn2, and getting exception for nn1 as it is not running, but this is not happening. I do not see anything in the logs after this point other than it trying to connect to <hostname1>/<IP1>:<port>
After step 8, I start standby namenode and issue hdfs namenode -bootstrapStandby
and restart both namenodes and finalize the upgrade.
I've tested these upgrade steps atleast 20-25 times, But this time I'm stuck at step 7.
Since nn2(hostname2) was the active namenode before upgrade, I expect it to come up and be the active namenode again(and turn safemode off), but that is not happening in this case. I could not find anything related to it, can someone please help with this?