1

I am trying to build a process of moving changed data from "Silver" Tables to "Gold" only processing changed records in Silver using Spark in Synapse , but its proving near impossible.

There is a feature in Delta Lake (v2 and higher) called "Change Data Feed" and it is exactly what I am looking for (https://docs.delta.io/latest/delta-change-data-feed.html) , but there seems to be one part missing in Synapse, and that is the Table Valued function called "table_changes" that allows you to query changes for a given Delta Table that has had the option for change detection set on.

Synapse allows you to activate Change Data Feed , you can describe the history of the delta table, it just seems that the Table Valued function called table_changes has not been implemented.

I am looking for any advice, or alternatives in Synapse for detecting changes in Delta Tables.

To reproduce , create a Spark Notebook in Synapse and execute the following code (PySpark)

Create a basic silver table in Delta



countries = [("USA", 10000, 20000), ("India", 1000, 1500), ("UK", 7000, 10000), ("Canada", 500, 700) ]
columns = ["Country","NumVaccinated","AvailableDoses"]
spark.createDataFrame(data=countries, schema = columns).write.format("delta").mode("overwrite").saveAsTable("silverTable")

Check that you can query the newly created table

%%sql
SELECT * FROM silverTable

Set the setting for enabling changing data feed

%%sql
ALTER TABLE silverTable SET TBLPROPERTIES (delta.enableChangeDataFeed = true)

Lets add in some changed data for the purposes of extracing said data

new_countries = [("Australia", 100, 3000)]
spark.createDataFrame(data=new_countries, schema = columns).write.format("delta").mode("append").saveAsTable("silverTable")

Query the changed table to view changes

%%sql
-- view the changes
SELECT * FROM table_changes('silverTable', 2, 5) order by _commit_timestamp

This produces an error :

Error: could not resolve table_changes to a table-valued function; line 2 pos 14 org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions$$anonfun$apply$1.$anonfun$applyOrElse$2(ResolveTableValuedFunctions.scala:37)

Francois
  • 11
  • 1
  • I tried the same thing in synapse and databricks. It worked in Databricks. But It doesn't work in Synapse. AFAIK, It is not supported – Aswin Mar 14 '23 at 07:19

2 Answers2

1

I tried writing the below function. It works for now:-

def fn_table_changes(arg1,arg2, *argv):    
    tablename = arg1    
    list1 = arg2    
    list2=''    
    for arg in argv:    
        list2 = argv[0]    
    if isinstance(list1, int) == True:    
        if isinstance(list2, int) == True:    
            df = spark.read.format("delta").option("readChangeFeed", "true").option("startingVersion", list1).option("endingVersion", list2).table(tablename)    
            return(df)    
        else:    
            #print("st version")    
            df = spark.read.format("delta").option("readChangeFeed", "true").option("startingVersion", list1).table(tablename)    
            return(df)    
    else:    
        if list2 != '':    
            #print("st&en timestamp")    
            df = spark.read.format("delta").option("readChangeFeed", "true").option("startingTimestamp", list1).option("endingTimestamp", list2).table(tablename)    
            return(df)    
        else:    
            #print("st timestamp")    
            df = spark.read.format("delta").option("readChangeFeed", "true").option("startingTimestamp", list1).table(tablename)    
            return(df)

and then we can call this:-

df=fn_table_changes('silverTable', 1)    
df.createOrReplaceTempView("table_changes")
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Vidhya
  • 11
  • 1
0

I tried the same script in synapse environment and databricks environment. It is not working in Synapse environment. Instead, you can retrieve data on the operations, user, timestamp for each write to a Delta table by running the history command.

%%sql
DESCRIBE  HISTORY silverTable;

enter image description here

This gave the history of the changes done in silverTable.

Aswin
  • 4,090
  • 2
  • 4
  • 16