I've reduced the replication factor from 3 to 1, yet do not see any activity from the namenode or between datanodes to remove overly-replicated HDFS file blocks. Is there a way to monitor or force the replication job?
4 Answers
Changing dfs.replication
will only apply to new files you create, but will not modify the replication factor for the already existing files.
To change replication factor for files that already exist, you could run the following command which will be run recursively on all files in HDFS:
hadoop dfs -setrep -w 1 -R /

- 40,830
- 17
- 95
- 117
-
1Shouldn't it be `hadoop fs -setrep -w 1 -R /`? – zeekvfu Nov 28 '13 at 07:35
-
Ideally, it should be `hadoop fs`. But although `hadoop dfs` is deprecated it still works. – PradeepKumbhar Mar 31 '16 at 06:22
-
2@zeekvfu, in effect, with last release of apache hdfs it must be: hdfs dfs -setrep -w 2 -R / – Mohammed Acharki Dec 18 '17 at 16:36
-
As 06-2023 the current version of the HDF filesystem comands is `hdfs dfs -command` – Luis Vazquez Jun 03 '23 at 23:04
When you change the default replication factor from 3 to let's say 2 from cloudera manager
Cloudera Manager(CDH 5.0.2) -> HDFS -> Configuration -> View and Edit -> Service-Wide -> Replication -> Replication Factor (dfs.replication) -> 2
then only new data written will have 2 replicas for each block.
Please use
hdfs dfs -setrep 2 /
on command line (generally a node with HDFS Gateway Role) if you want to change the replication factor of all the existing data. This command recursively changes the replication factor of all files under the root directory /.
Syntax:
hdfs dfs -setrep [-R] [-w] <numReplicas> <path>
where
-w flag requests that the command wait for the replication to complete and can take a very long time
-R flag is just for backwards compatibility and has no effect
Reference:

- 51
- 1
- 1
The new replication factor affects only new files. To change replication factor for existing files run in shell (on the node with hadoop entry point)
hadoop fs -setrep -w <replication factor> -R /
But, only "hdfs" can write to / ("hdfs" is the superuser, not "root"). So, may be you will have to run this:
sudo -u hdfs hadoop fs -setrep -w <replication factor> -R /

- 91
- 1
- 8