I have a Spark job that read millions of records from HDFS, processes them, and writes back to HDFS in AVRO format. Observed that many files (written) remain in .avro.tmp state.
I am using Kite SDK for writing data in AVRO format. The environment is CDH 5.5.
Could it be because the Spark job terminates as soon as it is done with reading records and sending them to executors (which actually does the writing?)
If that's the case, how do I ensure that the job does not terminate until all .tmp are converted into .avro? Or what else could be the reason?