I am using foreachbatch to write streaming data into multiple targets and its working fine for the first microbatch execution. When it tries to run the second microbatch, it fails with the below error. "StreamingQueryException: Query [id = 0d8e45ff-4f3a-42c0-964d-6f41c93df801, runId = 186a22bf-c75e-482b-bd4b-19b039a9aa38] terminated with exception: abfss://xxxx@xxxxxxxxxx.dfs.core.windows.net/primary/directory1 already exists"
Below is the foreach snippet I used.
df_new = <<<some streaming dataset>>>
val appId = "1dbcd4f2-eeb7-11ed-a05b-0242ac120003"
df_new.writeStream.format("delta")
.option("mergeSchema", "true").outputMode("append")
.option("checkpointLocation", "abfss://xxx@xxxxxxxxxx.dfs.core.windows.net/checkpoint/chkdir")
.foreachBatch { (batchDF: DataFrame, batchId: Long) =>
batchDF.persist()
val fc_final= batchDF.filter(col("msg_type") === "FC" )
.drop(columnlist_fc:_*)
fc_final.write
.option("txnVersion", batchId).option("txnAppId", appId)
.save("abfss://xxxx@xxxxxxxxxx.dfs.core.windows.net/primary/directory1")
val hb_final = batchDF.filter(col("msg_type") =!= "FC" )
.drop(columnlist_hb:_*)
hb_final.write.partitionBy("occurrence_month")
.option("txnVersion", batchId).option("txnAppId", appId)
.save("abfss://xxx@xxxxxxxxxx.dfs.core.windows.net/primary/directory2")
batchDF.unpersist()
()
}.start().awaitTermination()
What's the point I am missing here? Why its not able to append the data files to the delta directory even though I specified the mode=append. Your help is much appreciated.