0

Having many historical metadata files in apache iceberg helps us to produce a linear history of table versions and ensures that concurrent writes are not lost.

In Apache iceberg there is a table write property called:

write.metadata.previous-versions-max

and it's the max number of previous version metadata files to keep before deleting after commit (https://iceberg.apache.org/docs/latest/configuration/#write-properties).

The default is suggested to 100.

The documentation also says:

Iceberg keeps track of table metadata using JSON files. Each change to a table produces a new metadata file to provide atomicity.

Old metadata files are kept for history by default. Tables with frequent commits, like those written by streaming jobs, may need to regularly clean metadata files.

How does it work under the hood and what do I actually gain by reducing or increasing this number?

wbrycki
  • 121
  • 1
  • 8

0 Answers0