1

I'm building a custom output format for hadoop and was wondering if there is a way in the output format to know when all reducers (RecordWriters) are complete ?

In order to know that one RecordWriter completed, the close method of RecordWriter can be used, but what about executing some cleanup when all of the RecordWriters complete ?

Community
  • 1
  • 1
nomier
  • 402
  • 1
  • 3
  • 12

1 Answers1

1

You can use the driver itself to do the final clean up instead of relying on the OutputFormat. I doubt if it really provides such a feature(api). The finalize method may be the last resort, but not advisable at all.

The waitForCompletion method of Job returns only after the jobs finishes. So simply do it as :

boolean status = job.waitForCompletion(true); 
if(status){
     // clean up required for successful jobs
} else {
     // clean up required for failed jobs
}

If your clean up is irrelevant to the job's success/failure, just remove the if-else part. And if you really need a method in your OutputFormat class to do the deletion, make it static. eg :

job.waitForCompletion(true);
CustomOutputFormat.cleanUp();

I hope this should suffice your need.

blackSmith
  • 3,054
  • 1
  • 20
  • 37
  • I tried this solution and it it's the best I have so far, but what I wanted to do is do a cleanup independent of the job running and this won't achieve it. I didn't see anything in the API that mentions that this is supported. – nomier Nov 12 '14 at 21:41