1

I have set up a very simple Hadoop cluster with 3 VMs - one acts as the NameNode and other 2 nodes are data nodes. I created the HDFS file system with the format command and everything is working fine. I can save files to the HDFS system.

Now I want to add another data node. My assumption is that when I set up the new node, add it to the slaves file, and (presumably) restart HDFS, the system will realize a new node has been added, and disk space will be allocated and formatted on the new node to make it part of HDFS. Is this assumption correct? Obviously it would be undesirable to reformat the entire HDFS so I'm assuming Datanodes can be added "on the fly". Am I correct or do I need to perform other actions to make the new node provide storage for HDFS? Thanks!

Majid Hajibaba
  • 3,105
  • 6
  • 23
  • 55
ChrisRTech
  • 547
  • 8
  • 25
  • Possible duplicate of [Is there a way to add nodes to a running Hadoop cluster?](https://stackoverflow.com/questions/13159184/is-there-a-way-to-add-nodes-to-a-running-hadoop-cluster) – aksss Mar 22 '19 at 06:55
  • possible duplicate - https://stackoverflow.com/questions/13159184/is-there-a-way-to-add-nodes-to-a-running-hadoop-cluster/55294244#55294244 – aksss Mar 22 '19 at 06:55

1 Answers1

1

I'm assuming datanodes can be added "on the fly".

Your assumption is correct.

Nothing on HDFS needs formatted. The disk(s) of the new datanode should be and preferably the datanode directory is the same as the other nodes, but not necessarily.

You shouldn't need to restart HDFS. The Datanode registers with the namenode using RPC requests when its service runs

Tip: Using Apache Ambari makes installing, configuring, and managing the services much easier than editing and syncing XML yourself

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245