I receive the following error:
Task attempt_201304161625_0028_m_000000_0 failed to report status for 600 seconds. Killing!
for my Map jobs. This question is similar to this, this, and this. However, I do not want to increase the default time before hadoop kills a task that doesn't report progress, i.e.,
Configuration conf=new Configuration();
long milliSeconds = 1000*60*60;
conf.setLong("mapred.task.timeout", milliSeconds);
Instead, I want to periodically report progress using either context.progress()
, context.setStatus("Some Message")
or context.getCounter(SOME_ENUM.PROGRESS).increment(1)
or something similar. However, this still causes the job to be killed. Here are the snippets of code where I am attempting to report progress. The mapper:
protected void map(Key key, Value value, Context context) throws IOException, InterruptedException {
//do some things
Optimiser optimiser = new Optimiser();
optimiser.optimiseFurther(<some parameters>, context);
//more things
context.write(newKey, newValue);
}
the optimiseFurther method within the Optimiser class:
public void optimiseFurther(<Some parameters>, TaskAttemptContext context) {
int count = 0;
while(something is true) {
//optimise
//try to report progress
context.setStatus("Progressing:" + count);
System.out.println("Optimise Progress:" + context.getStatus());
context.progress();
count++;
}
}
The output from a mapper shows the status is being updated:
Optimise Progress:Progressing:0
Optimise Progress:Progressing:1
Optimise Progress:Progressing:2
...
However, the job is still being killed after the default amount of time. Am I using the context in the wrong way? Is there anything else I need to do in the job setup in order to report the progress successfully?