Big files causing shuffle error in hadoop map reduce

Question

I am seeing the following error when I try to process big file like size > 35GB files, but doesn't happen when I try less big file like size < 10GB .

App > Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#30

App > at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)

App > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)

App > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)

App > at java.security.AccessController.doPrivileged(Native Method)

App > at javax.security.auth.Subject.doAs(Subject.java:422)

App > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635)

App > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)

App > Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

The job still finish under qubole, since I think qubole retries the reduce step.

But I was wondering if there is setting such that I can avoid the errors at all so that the reduce job doesn't have to retry.

App > Failed reduce tasks=54

score 1 · Answer 1 · answered Apr 05 '19 at 06:48

Increase reducers parallelism. It can be done by setting mapreduce.job.reduces configuration property. If you are running Java application like this:

hadoop jar -Dmapreduce.job.maps=100 -Dmapreduce.job.reduces=200 your_jar.jar ...

In Hive it can be done using hive.exec.reducers.bytes.per.reducer property.

Also you can try to increase container Java heap size, read this

Big files causing shuffle error in hadoop map reduce

1 Answers1