0

Is used a little Py Spark code to create a delta table in a synapse notebook.

partial code:

# Read file(s) in spark data frame
sdf = spark.read.format('parquet').option("recursiveFileLookup", "true").load(source_path)

# Create new delta table with new data
sdf.write.format('delta').save(delta_table_path)

but now I want to use a different Synapse notebook with Spark SQL to read that delte table (incl history) that is stored in my data lake gen. I tried the createOrReplaceTempView option but that is not allowing me to see the history.

Partial code (block 1)

%%pyspark
ProductModelProductDescription = spark.read.format("delta").load(f'abfss://{blob_account_name}@{container}/Silver/{table}')
ProductModelProductDescription.createOrReplaceTempView(table)

partial code (block 2)

SELECT * FROM ProductModelProductDescription

part code (block 3)

DESCRIBE HISTORY ProductModelProductDescription

This gives an error: Table or view 'productmodelproductdescription' not found in database 'default'

In the video from Synapse they show how to work with history, but it doesn't show where the table is stored or how that table is created. It's alread there at the beginning. https://www.youtube.com/watch?v=v1h4MnFRM5w&ab_channel=AzureSynapseAnalytics

I can create a DeltaTable object in pySpark

%%pyspark
# Import modules
from delta.tables import DeltaTable
from notebookutils import mssparkutils

path = 'abfss://mysource@mydatalake.dfs.core.windows.net/Silver/ProductModelProductDescription'

delta_table = DeltaTable.forPath(spark, path)

But not sure how to continue in SPARK SQL with this object

Joost
  • 1,873
  • 2
  • 17
  • 18

2 Answers2

2

You don’t want a DataFrame; you want a DeltaTable. DataFrame is a generic API, and a DeltaTable is the specific API for Delta-specific stuff.

So DeltaTable.forName or DeltaTable.forPath instead of spark. Read.

In order to access the Delta table from SQL you have to register it in the metabase, eg

sdf.write.format("delta").mode("overwrite").saveAsTable("ProductModelProductDescription")

instead of

sdf.write.format('delta').save(delta_table_path)

See: https://docs.delta.io/latest/api/java/io/delta/tables/DeltaTable.html

David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67
  • Thanks! I can create a DeltaTable object in pySpark, but not sure how to continue in SPARK SQL (added that code in the original question) – Joost Mar 17 '23 at 14:23
0

You have to add your delta tables to the Lake Database of Synapse

$$pyspark
spark.sql(f'CREATE TABLE IF NOT EXISTS Bronze.{table_name} USING PARQUET LOCATION \'{parquet_path}\'')

or

$$sql
CREATE TABLE IF NOT EXISTS Silver.Cities
USING DELTA
Location 'abfss://mysource@mydatalake.dfs.core.windows.net/Silver/Cities'

enter image description here

Then you can use that table name to query your delta table (instead of using the entire data lake path)

More details https://microsoft-bitools.blogspot.com/2023/03/synapse-add-existing-delta-table-to.html

Joost
  • 1,873
  • 2
  • 17
  • 18