0

The replication factor in HDFS must be at least 3. Despite the fact that, the main purpose of choosing it to be 3 is fault-tolerance and the possibility of a rack failure is far less than the possibility of a node failure, is there another reason behind the replication factor to be at least 3?

Steve Severance
  • 6,611
  • 1
  • 33
  • 44

1 Answers1

3

There is no reason that the replication factor must be 3, that is the default that hadoop comes with. You can set the replication level individually for each file in HDFS. In addition to fault tolerance having replicas allow jobs that consume the same data to be run in parallel. Also if there are replicas of the data hadoop can attempt to run multiple copies of the same task and take which ever finishes first. This is useful if for some reason a box is being slow.

Steve Severance
  • 6,611
  • 1
  • 33
  • 44