2

I'm trying to write stream data in spark to delta format, but it looks like it won't allow me to use update in outputMode(), below is my code and error message:

deltaStreamingQuery = (eventsDF
  .writeStream
  .format("delta")
  .option("checkpointLocation", checkpointPath)
  .outputMode("update")
  .queryName("stream_1p")
  .start(writePath)
)
AnalysisException: 'Data source com.databricks.sql.transaction.tahoe.sources.DeltaDataSource does not support Update output mode;'```
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
efsee
  • 579
  • 1
  • 10
  • 22

1 Answers1

2

Currently Databricks Delta only supports append and complete as outputMode for sinks. append will add new rows to the table and complete will overwrite the table so perhaps this is what you are looking for to incorporate updates.

The official documentation is here => https://docs.databricks.com/delta/delta-streaming.html

thePurplePython
  • 2,621
  • 1
  • 13
  • 34
  • 1
    thanks for the answer, but when I try to use complete, it gives error: **'Complete output mode not supported when there are no streaming aggregations on streaming DataFrames/Datasets** – efsee Aug 30 '19 at 17:28
  • are you doing any aggregations? you would need to be performing aggregations for complete mode ... try append mode if you aren't performing any aggregations – thePurplePython Aug 30 '19 at 17:33
  • this is metadata table streaming job, so I just want to replace old records with new ones, like what update would do – efsee Aug 30 '19 at 17:36
  • ok well since you aren't performing aggregation perhaps drop the streaming and just use delta lake to update the table. – thePurplePython Aug 30 '19 at 18:01
  • 1
    I don't quite understand, I want to keep a streaming job because I want to replace the entire table every time the corresponding input file gets overwritten in my storage for example s3 – efsee Aug 30 '19 at 18:05
  • delta doesn't appear to support update in streaming mode ... take a look at this though for delta updates https://docs.databricks.com/delta/delta-update.html ... also kafka sink supports update in streaming mode if you have kafka – thePurplePython Aug 30 '19 at 18:17
  • Try updating the table using Merge https://docs.databricks.com/delta/delta-update.html#merge-examples – darekarsam Jan 27 '20 at 20:15