1

I have a query which gives me the latest version of a delta table in the following format

(Query 1):

%sql
SELECT VERSION FROM (DESCRIBE HISTORY dbo.customers LIMIT 1)

This gives me output like this:

Output of the above query

I have a query for table_changes which I'm using it in this way

(Query 2):

%sql
select * from table_changes('dbo.customers', 3)

It gives me the records of whatever changes were there in version 3 which is the latest version.

Now, when I am trying to do it in 1 query instead of 2, I attempted the following

(Query 3):

%sql
select * from table_changes('dbo.customers', SELECT VERSION FROM (DESCRIBE HISTORY dbo.customers LIMIT 1))

It gives me a syntax error.

I understand that (Query 1) is returning a table but I just want to use the value of the first row in the second argument of the table_changes function.

How do I use it or is there any better way to get the result in a single query?

Deval Modi
  • 63
  • 7

1 Answers1

1

You can use widget in databricks to use the value in table_changes as below.

%python
import json
result = spark.sql("SELECT  VERSION  FROM (DESCRIBE  HISTORY  delta.`/sampDelta/names/` LIMIT  1)").toJSON()
j_obj = json.loads(result.collect()[0])
print(j_obj['VERSION'])
dbutils.widgets.text('vers', str(j_obj['VERSION']))

enter image description here

Then use that value in function as Integer.

%sql
select  *  from  table_changes('delta.`/sampDelta/names/`', int('${vers}'))

enter image description here

JayashankarGS
  • 1,501
  • 2
  • 2
  • 6
  • 1
    This is something I have already tried. What I have asked is if there was a way we can do it in a single SQL statement. But as it turns out the 2nd parameter only accepts only 2 things....i.e. table_changes(table, ). So this won't be possible I think unless there are some updates to this function by Databricks in future. – Deval Modi Jun 09 '23 at 07:36
  • Yes. we can not give select expression in `table_changes` function as it accepts only `BIGINT` and `datetime`. Is it ok with the pyspark solution? – JayashankarGS Jun 09 '23 at 08:14
  • As I mentioned earlier, I have already tried it with PySpark way & that's not what I was looking for actually. I have to use it in a SQL cell in a pipeline. But thanks for the answer anyways! – Deval Modi Jun 09 '23 at 18:53