1

I am trying to configure a 50-node Hadoop 2.6.0 cluster for failure tolerance. Specifically, I'd like to be able to suddenly stop 5 servers and still have my job complete. So far, stopping even 1 server causes my job to fail with too many map failures error.

We host HDFS on the same cluster with replication factor = 2.

Can someone provide guidance on how to do this?

Having looked at similar posts, I am not not looking to have my job complete on subset of data.

manlio
  • 18,345
  • 14
  • 76
  • 126
tix
  • 2,138
  • 11
  • 18

0 Answers0