0

Here is my code. The writeStream is writing records in "parquet" format but not in "delta", even though I have mentioned delta format.

spark
.readStream
.format("delta")
.option("latestFirst","true")
.option("ignoreDeletes", "true")
.option("ignoreChanges","true")
.load("/mnt/data-lake/data/bronze/accounts")
.writeStream
.format("delta")
.outputMode("append")
.option("checkpointLocation","/mnt/data-lake/tmp/chkpnt_accounts_inserts")
.option("path","/mnt/data-lake/tmp/accounts_inserts")
.start()
Don Sam
  • 525
  • 5
  • 20
  • "delta format"? There's no "delta format", but parquet with a transaction log in `_delta_log` directory. If there's this directory, you're likely using a delta format. Can you show the files and directories in `/mnt/data-lake/tmp/accounts_inserts` directory? – Jacek Laskowski Jan 10 '20 at 14:56
  • You are right. I mentioned delta as in the format we give "delta". I understand still it's parquet format. However my problem was resolved in the next run and _delta_log directory got created. No clue why it did not happen in the first go! – Don Sam Jan 11 '20 at 02:35
  • Looks like you've sorted it out yourself. If you think there's anything to help you out with, let us know. If not, mind if closed the question (as a user error)? – Jacek Laskowski Jan 11 '20 at 19:25
  • Sure..we can close this question. Thanks. – Don Sam Jan 11 '20 at 19:32
  • @DonSam Would you like you post that as an answer so you can mark this question as answered? – CHEEKATLAPRADEEP Feb 04 '20 at 04:17

1 Answers1

0

Sharing the answer as per the comment by the original poster.

There's no "delta format", but parquet with a transaction log in _delta_log directory. If there's this directory, you're likely using a delta format.

I mentioned delta as in the format we give "delta". I understand still it's parquet format. However my problem was resolved in the next run and _delta_log directory got created.

CHEEKATLAPRADEEP
  • 12,191
  • 1
  • 19
  • 42