0

I want to restore previous version of delta table by creating its copy first with copy job run date folder name and then restore delta table using that copy file

Any suggestion here.

Here's what I'm trying:

version_timestamp = dbutils.widgest.get("widget_name")
var1= "SELECT version FROM (describe history dbname.tablename) where to_timestamp(timestamp,'yyyy-MM-dd HH:mm:ss') = '"+version_timestamp+"'" 
version=spark.sql(var1).collect() 

If version == version_timestamp it creates new folder and writes else fails

Powers
  • 18,150
  • 10
  • 103
  • 108
exploding_data
  • 317
  • 1
  • 14
  • Can you please add some more details to this question? Here is the documentation for the RESTORE command: https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-restore.html. What have you tried? Did the command cause you any issues? – Powers Sep 26 '22 at 13:51
  • @Powers ,yes i am using delta lake documentation . My aprroach is read version from delta file and create a new folder based on version and then to restore using it. version_timestamp = dbutils.widgest.get("widget_name") var1= "SELECT version FROM (describe history dbname.tablename) where to_timestamp(timestamp,'yyyy-MM-dd HH:mm:ss') = '"+version_timestamp+"'" version=spark.sql(var1).collect() If version == version_timestamp it creates new folder and writes else fails – exploding_data Sep 28 '22 at 06:07
  • 1
    I don't think you normally need to copy data when restoring to a previous version. That's one of the benefits of versioned data. Suppose you have a dataset with v1, v2, and v3. If you restore to v2, then you can still access v3. So perhaps in your particular case, copying the data is needed, but I would usually recommend against this. – Powers Sep 28 '22 at 11:35
  • yes you are right delta has that feature already where you can time travel and restore but this additional requirement is in my usecase – exploding_data Sep 28 '22 at 11:37
  • Does the answer provided by matkurek solve your issue? If so, can you please accept it? – Powers Sep 28 '22 at 11:38

1 Answers1

1

First create folder with backup, adjust added date to whatever else you want:

import datetime
path = 'dbfs:/mnt/your_dataset_path'
bck_path= path +'_backup_'+datetime.datetime.utcnow().strftime("%Y%m%d%H%M%S")
    
dbutils.fs.cp(path, bck_path)

Then restore using time travel, either using the backup path or the original:

from delta.tables import *

deltaTable = DeltaTable.forPath(spark, '/mnt/your_dataset_path')  # path-based tables, or
deltaTable = DeltaTable.forName(spark, 'table-name')    # Hive metastore-based tables

deltaTable.restoreToVersion(0) # restore table to oldest version

deltaTable.restoreToTimestamp('2022-09-14') # restore to a specific timestamp
matkurek
  • 553
  • 5
  • 12