I have a scenario, where I want to run a Spark Structured Streaming job to read a Databricks Delta Source File and extract only the inserts to the Source file. I want to filter-out any Updates/Deletes.
I was trying following on a smaller file but the code does not seem to do what I expect.
spark
.readStream
.format("delta")
.option("latestFirst","true")
.option("ignoreDeletes", "true")
.option("ignoreChanges","true")
.load("/mnt/data-lake/data/bronze/accounts")
.writeStream
.format("delta")
.outputMode("append")
.option("checkpointLocation","/mnt/data-lake/tmp/chkpnt_accounts_inserts")
.option("path","/mnt/data-lake/tmp/accounts_inserts")
.start()