0

Let's say I'm at the point where delta log of delta table has become too big, and I'm 100% confident that it's OK to treat the current version of table as version 0 and discard delta log for good. What's the best way to clean up, reset delta log but keep the data? It's OS delta on S3.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Andrii Black
  • 185
  • 2
  • 14

1 Answers1

-1

So, i will say it, but think before do it.

  • Just read delta folder(data) as parquet, and write in other place as delta format that will create a new table with new history. (mode:append)
  • Other way is just delete of log files in delta folder after that i use DeltaTable.convertToDelta. There have a lot of issue with it. But if u have only append mode, that will work fine.
  • Also u can manually change of log files but if u don't understand how to do it don't try on real data only on generated. Keep in the mind if u work in databricks spark have a custom metastore where delta table a parto of it, probably they are copy metadata from delta transaction to that metastore. So need to check it if is possible. I don't have databrick to do it. So up to u.