I am running a job in 9 nodes.
All of them are going to write some information to files doing simple writes like below:
dfLogging.coalesce(1).write.format('delta').mode('append').save('/dbfs/' + loggingLocation)
However I am receiving this exception:
py4j.protocol.Py4JJavaError: An error occurred while calling o106.save. : java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 14.0 failed 1 times, most recent failure: Lost task 1.0 in stage 14.0 (TID 259, localhost, executor driver): org.apache.hadoop.fs.ChecksumException: Checksum error: file:/dbfs/delta/Logging/_delta_log/00000000000000000063.json at 0 exp: 1179219224 got: -1020415797
It looks to me, that because of concurrency, spark is somehow failing and it generates checksum errors.
Is there any known scenario that may be causing it?