I am running spark jobs on the AWS EMR cluster submitting them from the client host machine. Client machine is just an EC2 instance that submits jobs to the EMR with yarn in cluster mode.
The problem is - spark saves temp files each of 200Mb like:
/tmp/spark-456184c9-d59f-48f4-9b0560b7d310655/__spark_conf__6943938018805427428.zip
Tmp folder is getting filled with such files very fast and I start getting failed jobs with the error:
No space left on device
I tried to configure spark.local.dir
in spark-defaults.conf to point to my s3 bucket, but it adds user directory prefix to the path like this: /home/username/s3a://my-bucket/spark-tmp-folder
Could you please suggest how I can fix this problem?