1

I am parsing a data in order to get some sense out of it through MapReduce job. The parsed data comes in form of batches. It is further loaded to hive external table through spark streaming job. This is a real time process. Now an unusual event was faced by me today as a _temporary directory got created in output location, due to which loading into hive table failed as a directory can't be loaded into hive table. It happened only once and rest of the jobs are running fine. Please refer the screenshot.

Screenshot of output location

_temporary directory further contains task IDs as sub-directories which are empty. Can anyone please help in resolving this so that it could be avoided in future.

Community
  • 1
  • 1
Mohit Sudhera
  • 341
  • 1
  • 4
  • 16

1 Answers1

1

_temporary directory is created when there are some tasks yet to complete, there may be few data yet to move its actual location from its temporary location. The task may showed as completed in web UI but the data movement is yet to be completed. Once this process is completed there would be only _SUCCESS file. You can check this by monitoring the size of the _temporary directory.This would be gradually reducing.

moasifk
  • 70
  • 7
  • There was nothing in this directory, and the batch got successfully completed. Ideally, it should delete _temporary directory but even after the job completion, the directory was found there. – Mohit Sudhera Jun 05 '17 at 13:06
  • Hey buddy, I'm facing the same issue here, have u found workaround to solve this phantom _temporary issue? @MohitRaja – KAs Oct 10 '17 at 06:49