How to avoid from failing Map/Reduce task in Hadoop

Question

I have a loop with too many iterations and a function which is computation heavy in Reducer function.

while (context.getCounter(SOLUTION_FLAG.SOLUTION_FOUND).getValue() < 1 && itrCnt < MAX_ITR)

MAX_ITR is iterations count - user input

The problem is when I run it on Hadoop cluster there is timeout error and Reducer task is killed

17/05/06 21:09:43 INFO mapreduce.Job: Task Id : attempt_1494129392154_0001_r_000000_0, Status : FAILED
AttemptID:attempt_1494129392154_0001_r_000000_0 Timed out after 600 secs

What should I do to avoid timeout? (My guess is heartbeat signals.)

score 1 · Answer 1 · edited May 23 '17 at 11:47

1

The reason for the timeouts might be a long-running computation in reducer without reporting the job progress ststus back to the Hadoop framework. You can try increasing the timeout interval from default 600 sec using below command.

mapred.task.timeout=1800000

Here is more reference on this.

If these settings doesn't works then consider rechecking the code. There could be an issue with code logic too.

edited May 23 '17 at 11:47

Community

1
1

answered May 07 '17 at 09:06

Sandeep Singh

7,790
4
43
68

Actually, [this](http://stackoverflow.com/a/11815803/7584363) one solved the issue perfectly but the reference link was good. Thanks – Avinash L May 07 '17 at 10:47

How to avoid from failing Map/Reduce task in Hadoop

1 Answers1