Join data in Azure Synapse from dedicated sql pool and serverless sql pool

Question

I have defined a view in an AzureSynapse serverless pool that fetches data with openrowset. I also have data in a table in a dedicated sql pool.

Can I do a sql query that would join these two tables?

score 4 · Accepted Answer · answered Mar 09 '21 at 12:09

Azure Synapse Analytics really has three separate engines (at least that's the way I see it) and your SQL query has to execute from one of these engines and that gives you your choices:

Execute from inside the dedicated SQL pool. You will have access to your dedicated table and depending on the definition of your serverless SQL pool OPENROWSET query, you may be able to reproduce it within dedicated, eg as an external table. If you've created a view in serverless, there is no way to reference it directly at this time from dedicated. Dedicated does not support OPENROWSET (at this time) in the same way serverless does.
Execute from inside serverless. You will not have direct access to your table from dedicated, but you could dump it out using CREATE EXTERNAL TABLE AS (CETAS) into your data lake, which would then allow you to read it from serverless with another OPENROWSET query.
Execute from Apache Spark pools. Use the third engine to bring the other two together. Getting access to dedicated tables from within Azure Analytics Spark notebooks is as easy as running spark.read.synapsesql with a three-part name. Here is a simple example (in Scala)
```
val dfNation = spark.read.synapsesql("tpch_ded.dbo.nation")

dfNation.show

dfNation.createOrReplaceTempView("nation")
```

Getting access to a serverless object is a bit more involved but I got this Scala example to work inspired by Jovan Popovich here:

Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");

val jdbcHostname = "XXX-ondemand.sql.azuresynapse.net"
val jdbcPort = 1433
val jdbcDatabase = "master"

//Create connection String
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=60;"

import java.util.Properties
val props = new Properties()
val jdbcUsername = "someSQLUser"
val jdbcPassword = "xxx"
props.put("user", s"${jdbcUsername}")
props.put("password", s"${jdbcPassword}")

val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
props.put("Driver", s"${driverClass}")

val sqlQuery = """select top 10 * 
from openrowset( 
    bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet', 
    format = 'parquet') AS rows"""


// this worked but isn't using the prop object
 val df = spark.read.format("jdbc").
     option("url", jdbcUrl).
     option("user", s"${jdbcUsername}").
     option("password", s"${jdbcPassword}").
     option("Driver", s"${driverClass}").
     option("query", sqlQuery).
     load()

df.show(false)

So you could bring the two dataframes together in a notebook using join. The Spark pools and serveless SQL pools also have a shared metadata model so you can create managed\unmanaged tables in Spark and they appear in serverless. Better examples here.

We're looking at using pattern 2 at the moment which has the downside of duplication with the upside of reducing cost on the dedicated SQL pool, ie use dedicated to import and process data, dump it out then pause it.

It feels like it should be simpler right? Maybe I'm missing something or maybe this is set to improve on the roadmap.

Not the answer I was hoping for. Looks like we need to get everything into dedicated pool but then we could go with Snowflake instead. — Mathias Rönnlund, Mar 09 '21 at 17:51
@RonDunn any views on this or have I missed something? Is there something on the roadmap that might help? Thanks — wBob, Mar 11 '21 at 10:19
Interesting on Snowflake - just out of interest how would it be different? — wBob, Mar 11 '21 at 10:20
@wBob, I believe I have a db-reader settings wrong. I do not know where to reset them from. I have no issues working out solution in chat. You can post answer after and will accept. It is urgent work. https://stackoverflow.com/questions/67329558/read-table-created-using-select-statement-in-sql-script-into-sql-notebook — wwnde, Apr 30 '21 at 22:52
This section talks about a db_exporter role too: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/synapse-spark-sql-pool-import-export#prerequisites — wBob, May 01 '21 at 12:25

Join data in Azure Synapse from dedicated sql pool and serverless sql pool

1 Answers1

Linked