0

I have a use case to process data into Delta Lake table by partition. All the partitions in the table are disjoint, meaning they don't speak to each other. When I process data into specific partition, it includes various operations like inserts, updates, deletes. If one of these operations fails, I should restore the data to previous successful state only for that specific partition. Delta Lake restores are at table level as per the documentation. Is there a way to restore by partition?

I tried restoring at the table level but that doesn't work in case of concurrent writes.

1 Answers1

0

We can approach of restore to previous version by these steps:

1.Describe history of the table and get your desired version from here.

desc history <delta_table>

2.Read the delta table for the specific version.

select * from <delta_table>@v<version_number> where <partition_col>='<partition_value>'

3.insert overwrite specific partition:

INSERT INTO TABLE <delta_table> REPLACE WHERE <partition_col>='<partition_value>' select * from <delta_table>@v<version_number> where <partition_col>='<partition_value>'

notNull
  • 30,258
  • 4
  • 35
  • 50