I have a requirement in my project where i am implementing SCD type 2 table in Azure SQL DW. I am able to insert new records using JDBC connector but i need to update old records as "expired" and update other records as per updated values.
Asked
Active
Viewed 4,789 times
1 Answers
3
The recommended pattern is to either use the Databricks Azure SQL DW Connector to bulk load a staging table, or write the new data to files in blob storage, or datalake and use a Polybase External Table to hold the new rows.
Then, once the new data is either loaded into a staging table, or available in an Polybase External Table, write a TSQL stored procedure to "update old records as "expired" and update other records as per updated values".
Spark only knows how to run queries and load tables. But you've got the JDBC driver installed, and can use JDBC access from Scala or Java. EG:
%scala
import java.util.Properties
import java.sql.DriverManager
val jdbcUsername = dbutils.secrets.get(scope = "kv", key = "sqluser")
val jdbcPassword = dbutils.secrets.get(scope = "kv", key = "sqlpassword")
val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
// Create the JDBC URL without passing in the user and password parameters.
val jdbcUrl = s"jdbc:sqlserver://xxxxx.database.windows.net:1433;database=AdventureWorks;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
val connection = DriverManager.getConnection(jdbcUrl, jdbcUsername, jdbcPassword)
val stmt = connection.createStatement()
val sql = """
exec usp_someproc ...
"""
stmt.execute(sql)
connection.close()

David Browne - Microsoft
- 80,331
- 6
- 39
- 67
-
That sounds good to me. But is there any way to avoid TSQL stored procedure and do this in databricks or any other way. – shubham nayak Jan 25 '19 at 07:59
-
You can run the same TSQL INSERT and UPDATE commands from Databricks using JDBC after loading the staging data. A stored procedure would just store those commands on the SQL Server. There's no real difference at runtime. – David Browne - Microsoft Jan 25 '19 at 14:05
-
that sounds good to me. but how do you call the stored procedure from spark? – Artur Mar 19 '19 at 18:20
-
There is no way to run just TSQL commands from databricks once the data has been staged in Azure SQL? – viejoEngineer May 27 '20 at 19:56
-
See updated answer with an example of running an arbitrary SQL batch. – David Browne - Microsoft May 27 '20 at 20:37