1

I have a scenario, where I want to run a Spark Structured Streaming job to read a Databricks Delta Source File and extract only the inserts to the Source file. I want to filter-out any Updates/Deletes.

I was trying following on a smaller file but the code does not seem to do what I expect.

spark
.readStream
.format("delta")
.option("latestFirst","true")
.option("ignoreDeletes", "true")
.option("ignoreChanges","true")
.load("/mnt/data-lake/data/bronze/accounts")
.writeStream
.format("delta")
.outputMode("append")
.option("checkpointLocation","/mnt/data-lake/tmp/chkpnt_accounts_inserts")
.option("path","/mnt/data-lake/tmp/accounts_inserts")
.start()
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Don Sam
  • 525
  • 5
  • 20

0 Answers0