how spark write to s3 or azure blob distributively

Question

When we use spark to write out files on AWS s3 or Azure blob storage, we can simply write:

df.write.parquet("/online/path/folder")

then the contents will be written to hundreds of files under the specified folder, like this:

/online/path/folder/f-1
/online/path/folder/f-2
...
/online/path/folder/f-100

My question is since the write is executed on tens or hundreds of sparks executors simultaneously, how do they avoid writting to the same file? Another important question is what is some executor failed and restarted? Will that restarted executor write to the same file before it failed?

score 2 · Accepted Answer · answered Nov 25 '20 at 22:28

2

Spark adds UUID, partition number and other task related information to file name, so it guaranties a file name uniqueness across all executors and tasks.

part-00000-a4ec413d-cb30-4103-afe1-410c11a164e8-c000.snappy.parquet

Similar question here: Could anyone please explain what is c000 means in c000.snappy.parquet or c000.snappy.orc??

By default, Spark writes files to a temporary folder and waits for all reducers to complete, then it executes a commit job operation that moves all files to a destination folder. Thus, in case of failure, Spark can safely start a new executor to complete failed tasks and rewrite results.

answered Nov 25 '20 at 22:28

falcon-le0

549
3
6

Suppose I have 100 executors, they write results to a temp folder "/tmp/random-folder/*.parquet". now one executor failed and restarted, in this case, will the restarted executor continue to write to "/tmp/random-folder/*.parquet" ? The reason I ask is because I had some "specified blob already exists" errors. – derek Nov 26 '20 at 00:29
Do you use any of these config properties in your Spark application: spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored or spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version ? This may lead to such issues. – falcon-le0 Nov 26 '20 at 07:59
each task attempt writes to a task attempt dir, and when told to commit it renames the task attempt dir to the job attempt dir (v1 commit), or moves each individual file to the dest (v2). There's a lot more to that, key for s3 as a dest is: neither commit algorithm is safe there – stevel Nov 27 '20 at 11:40
@falcon-le0 I did not use either of those two configs. – derek Nov 30 '20 at 05:46
@stevel I am use spark 2.4. does that mean I am using v2? – derek Nov 30 '20 at 05:51
unless you are setting up the s3a committers, you will be using FileOutputCommitter with v2 commit algorithm by default – stevel Dec 01 '20 at 13:24

how spark write to s3 or azure blob distributively

1 Answers1