I am created a HDFS cluster, with one namenode, two datenodes, and one secondary namenode. So, in total four machines/servers (with four IPs).
The configuration (core-site.xml, hdfs-site.xml, ...) is set up on the namenode and then copied to the ~/hadoop.X.X.X/etc/hadoop folder on the two datanodes and the one secondary namenode.
In the hdfs-site.xml ,
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
The replication factor is set as 2. However, every time when I upload data onto HDFS, on the web interface, the replication factor is always 3, and the corresponding file is under-replicated.
I can of course run hadoop fs -setrep -w 2 /hdfsPathToTheFile
to change the replication factor from 3 back to 2. And when i have run hadoop fs -setrep -w 2 /
for all the existing files, then no file is reported under-replicated.
However, I want to avoid such situation and want the dfs.replication
(=2) defined in hdfs-site.xml reflected for the newly uploaded file.
The same problem has been discussed here: https://community.cloudera.com/t5/Support-Questions/Replication-factor-in-HDFS/td-p/117934, but no answer solves my problem.
On stackoverflow, these two posts are also similar: HDFS replication property not reflecting as defined in hfs-site.xml , HDFS replication factor
I have tried to follow their advice, e.g. hdfs dfsadmin -refreshNodes
, and/or restart hadoop cluster, and/or even re-install a completely new hadoop cluster. But none seems solves the problem that the dfs.replication (=2)
is not reflected for the newly uploaded files.
Anyone has idea?