Resume Hadoop Jobs workflow

Question

In my application, I have a series of 5 Hadoop Jobs which are chained together sequentially using

Job.waitForCompletion(false)

Now, the Hadoop docs clearly states

...the onus on ensuring jobs are complete 
(success/failure) lies squarely on the clients

Now, if my job client program crashes, how do I ensure that the job client program can resume at the point of crash when it is restarted ? Is there any way to query the JobTracker and get a handle to a specific job and thereafter check its job status ?

Not an answer, but you should check out http://incubator.apache.org/oozie/ Its a job workflow engine that allows you to manage / recover from failures like this — Chris White, May 30 '12 at 03:04
What happens if oozie itself crashes? Can it resume from the point of crash on restart? — cosmos, May 30 '12 at 03:40
oozie uses the notification url configuration property - as each MR job completes, the job tracker notifies OOZIE via this url. If oozie crashes then once it restarts you can manually tell oozie to resume that particular workflow after the last job in the workflow completed — Chris White, May 30 '12 at 10:19

score 0 · Accepted Answer · answered May 30 '12 at 08:00

Following approach can be tried out when client itself crashes:

Hadoop provides JobClient which can be used for tracking current running jobs in cluster. So when client restarts following methods of JobClient can be used:

jobsToComplete() -Get the jobs that are not completed and not failed
jobsToComplete() -Get the jobs that are not completed and not failed
getAllJobs()- Get the jobs that are submitted.
getClusterStatus() - Get status information about the Map-Reduce cluster.
submitJob(JobConf job)- Submit a job to the MR system,if it has failed.

Resume Hadoop Jobs workflow

1 Answers1