Let's say I'm at the point where delta log of delta table has become too big, and I'm 100% confident that it's OK to treat the current version of table as version 0 and discard delta log for good. What's the best way to clean up, reset delta log but keep the data? It's OS delta on S3.
Asked
Active
Viewed 422 times
1 Answers
-1
So, i will say it, but think before do it.
- Just read delta folder(data) as parquet, and write in other place as delta format that will create a new table with new history. (mode:append)
- Other way is just delete of log files in delta folder after that i use
DeltaTable.convertToDelta
. There have a lot of issue with it. But if u have only append mode, that will work fine. - Also u can manually change of log files but if u don't understand how to do it don't try on real data only on generated. Keep in the mind if u work in databricks spark have a custom metastore where delta table a parto of it, probably they are copy metadata from delta transaction to that metastore. So need to check it if is possible. I don't have databrick to do it. So up to u.

Роберт Надь
- 57
- 2
- 5