1

I am executing a mapreduce job with mapreduce action in oozie workflow. I am using multipleOutputs in Reducer phase as I want the output to be in different directories. I am able to get the multiple outputs in different directories as expected. But, the only issue I am facing currently is the output is contained in _temporary directory and _taskid directory. Eg: The output is in : /user/sajain/output/_temp/_attempt_201702011607_103192_r_000003_1/file1.xml

My expected output is : /user/sajain/output/file1.xml

The job is completing successfully. As per the official oozie docs, at the end of the successful job it should remove this temporary directory. Can anyone please help

jsanjayce
  • 272
  • 5
  • 15
  • Is this issue with usage of multipleOutputs..Please help? – jsanjayce May 02 '17 at 17:38
  • I checked the logs(below) and found that the job is not running output commiter which would remove the temporary directory. Logs: 2017-05-04 08:51:42,356 INFO org.apache.hadoop.mapred.Task: Task:attempt_201702011607_109046_r_000003_0 is done. And is in the process of commiting 2017-05-04 08:51:42,398 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201702011607_109046_r_000003_0' done. – jsanjayce May 04 '17 at 09:06

0 Answers0