0

Delta Lake has the capability of transforming existing parquet data to a delta table, by "simply" adding its own metadata - the _delta_log file.

https://docs.delta.io/2.2.0/delta-utility.html#convert-a-parquet-table-to-a-delta-table

-- Convert partitioned Parquet table at path '<path-to-table>' and partitioned by integer columns named 'part' and 'part2'
CONVERT TO DELTA parquet.`<path-to-table>` PARTITIONED BY (part int, part2 int)

That is really convenient since it's a zero-copy operation (I suppose my understanding is right based on the source code here).

Does Iceberg share the same feature?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
YFl
  • 845
  • 7
  • 22
  • 1
    You're correct that Delta Table's convert to Delta functionality is in-place and zero copy for all APIs, e.g. [SQL](https://docs.databricks.com/sql/language-manual/delta-convert-to-delta.html#syntax), [Python](https://docs.delta.io/latest/api/python/index.html#delta.tables.DeltaTable.convertToDelta), [Scala](https://docs.delta.io/latest/delta-utility.html#convert-a-parquet-table-to-a-delta-table&language-scala), etc. This makes it a very efficient operation as you're generating a small amount of metadata in the form of the delta transaction log instead of copying the underlying files. – Jim Hibbard Mar 30 '23 at 22:48

1 Answers1

1

The nearest equivalent to Delta Lake's convertToDelta method, described here, is Iceberg's migrate. Iceberg also has an add_files method which attempts to directly add files from a Hive or file based table into a given Iceberg table. This method should be used with care, taken from iceberg docs:

This procedure will not analyze the schema of the files to determine if they actually match the schema of the Iceberg table. Upon completion, the Iceberg table will then treat these files as if they are part of the set of files owned by Iceberg. This means any subsequent expire_snapshot calls will be able to physically delete the added files.

This could create inconsistencies if those files are owned by another metastore. This isn't an issue if you're planning a one way migration off of the original metastore. In that scenario it should be an efficient conversion.

edit: I should also mention Iceberg's snapshot feature which allows you to test a migration in a lightweight way before converting.

Jim Hibbard
  • 205
  • 1
  • 6