0

I am using foreachbatch to write streaming data into multiple targets and its working fine for the first microbatch execution. When it tries to run the second microbatch, it fails with the below error. "StreamingQueryException: Query [id = 0d8e45ff-4f3a-42c0-964d-6f41c93df801, runId = 186a22bf-c75e-482b-bd4b-19b039a9aa38] terminated with exception: abfss://xxxx@xxxxxxxxxx.dfs.core.windows.net/primary/directory1 already exists"

Below is the foreach snippet I used.

df_new = <<<some streaming dataset>>>
  
val appId = "1dbcd4f2-eeb7-11ed-a05b-0242ac120003" 
    
df_new.writeStream.format("delta")
  .option("mergeSchema", "true").outputMode("append")
  .option("checkpointLocation", "abfss://xxx@xxxxxxxxxx.dfs.core.windows.net/checkpoint/chkdir")
  .foreachBatch { (batchDF: DataFrame, batchId: Long) =>
      batchDF.persist()
      val fc_final= batchDF.filter(col("msg_type") === "FC" )
        .drop(columnlist_fc:_*)
      fc_final.write
       .option("txnVersion", batchId).option("txnAppId", appId)
       .save("abfss://xxxx@xxxxxxxxxx.dfs.core.windows.net/primary/directory1")
    
      val hb_final = batchDF.filter(col("msg_type") =!= "FC" )
        .drop(columnlist_hb:_*)
    
      hb_final.write.partitionBy("occurrence_month")
        .option("txnVersion", batchId).option("txnAppId", appId)
        .save("abfss://xxx@xxxxxxxxxx.dfs.core.windows.net/primary/directory2")
      
      batchDF.unpersist()
    ()
    
    }.start().awaitTermination()

What's the point I am missing here? Why its not able to append the data files to the delta directory even though I specified the mode=append. Your help is much appreciated.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Nikesh
  • 47
  • 6

1 Answers1

2

The problem is that in the both .write inside the .foreachBatch you don't specify the the save mode, that is for batch writes is SaveMode.ErrorIfExists, meaning to throw an error if data exists. You need to change it to SaveMode.Append if you want to append data:

fc_final.write
       .mode("append")
       .option("txnVersion", batchId).option("txnAppId", appId)
       .save("abfss://xxxx@xxxxxxxxxx.dfs.core.windows.net/primary/directory1")
Alex Ott
  • 80,552
  • 8
  • 87
  • 132