0

I am created a HDFS cluster, with one namenode, two datenodes, and one secondary namenode. So, in total four machines/servers (with four IPs).

The configuration (core-site.xml, hdfs-site.xml, ...) is set up on the namenode and then copied to the ~/hadoop.X.X.X/etc/hadoop folder on the two datanodes and the one secondary namenode.

In the hdfs-site.xml ,

<property>
    <name>dfs.replication</name>
    <value>2</value>
</property>

The replication factor is set as 2. However, every time when I upload data onto HDFS, on the web interface, the replication factor is always 3, and the corresponding file is under-replicated.

enter image description here

I can of course run hadoop fs -setrep -w 2 /hdfsPathToTheFile to change the replication factor from 3 back to 2. And when i have run hadoop fs -setrep -w 2 / for all the existing files, then no file is reported under-replicated.

However, I want to avoid such situation and want the dfs.replication (=2) defined in hdfs-site.xml reflected for the newly uploaded file.

The same problem has been discussed here: https://community.cloudera.com/t5/Support-Questions/Replication-factor-in-HDFS/td-p/117934, but no answer solves my problem.

On stackoverflow, these two posts are also similar: HDFS replication property not reflecting as defined in hfs-site.xml , HDFS replication factor

I have tried to follow their advice, e.g. hdfs dfsadmin -refreshNodes , and/or restart hadoop cluster, and/or even re-install a completely new hadoop cluster. But none seems solves the problem that the dfs.replication (=2) is not reflected for the newly uploaded files.

Anyone has idea?

XYZ
  • 352
  • 5
  • 19
  • I do not know what happens, but magic happens when I restart the HDFS (start-dfs.sh) again. Now the dfs.replication on the HDFS web interface is shown as 2. – XYZ May 20 '21 at 09:31
  • you probably configured `hdfs-site.xml` after start it. – Majid Hajibaba May 31 '21 at 12:55

0 Answers0