I'm looping through some CSV files in a folder. I want to write these CSV files as delta table only if they are all valid. Each CSV files in a folder as different name and schemas. I want to reject the entire folder and all the files it contains until data are fixed. I'm running a lot of test but ultimately I have to actually write the files as delta table with the following loop (simplified for this question):
for f in files:
# read csv
df = spark.read.csv(f, header=True, schema=schema)
# writing to already existing delta table
df.write.format("delta").save('path/' + f)
Is there a callback mechanism so the write method is executed only if all the dataframe doesn't returns any errors? Delta table schema enforcement is pretty rigid which is great, but errors can pop at any time despite all the test I'm running before passing these files in this loop.
union
is not an option because I want to handle this by date and each files has different schemas and names.