1

I am seeing the following error when I try to process big file like size > 35GB files, but doesn't happen when I try less big file like size < 10GB .

App > Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#30

App > at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)

App > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)

App > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)

App > at java.security.AccessController.doPrivileged(Native Method)

App > at javax.security.auth.Subject.doAs(Subject.java:422)

App > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635)

App > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)

App > Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

The job still finish under qubole, since I think qubole retries the reduce step.

But I was wondering if there is setting such that I can avoid the errors at all so that the reduce job doesn't have to retry.

App > Failed reduce tasks=54
Jal
  • 2,174
  • 1
  • 18
  • 37

1 Answers1

1

Increase reducers parallelism. It can be done by setting mapreduce.job.reduces configuration property. If you are running Java application like this:

hadoop jar -Dmapreduce.job.maps=100 -Dmapreduce.job.reduces=200 your_jar.jar ...

In Hive it can be done using hive.exec.reducers.bytes.per.reducer property.

Also you can try to increase container Java heap size, read this

leftjoin
  • 36,950
  • 8
  • 57
  • 116