2

1) I have a map-only Hadoop job which streams the data to the Cassandra cluster.

2) Sometimes streaming takes more than 10 minutes and as the progress is not reported to the job it kills the task.

3) I have tried to report the progress with context.progress() method but it did not help.

Is there anything else needed to report the progress to hadoop job?

I have written a sample code as following to simulate the issue and with the following code.

Thread.sleep(360000);

context.progress();

Thread.sleep(360000);

It fails with following error message

12/02/06 11:40:25 INFO mapred.JobClient: Task Id : attempt_201202061119_0001_m_000001_1, Status : FAILED Task attempt_201202061119_0001_m_000001_1 failed to report status for 601 seconds. Killing!

samarth
  • 3,866
  • 7
  • 45
  • 60

2 Answers2

0

context.progress() should work, but it could be that you are facing the following issue: https://issues.apache.org/jira/browse/MAPREDUCE-1905 , which is fixed in the later versions.

Harinder
  • 11,776
  • 16
  • 70
  • 126
0

Please see this question:
How to fix "Task attempt_201104251139_0295_r_000006_0 failed to report status for 600 seconds."

setting mapred.task.timeout property to higher value is the easiest way to fix this problem.

Community
  • 1
  • 1
wlk
  • 5,695
  • 6
  • 54
  • 72