Where is data physically stored in Azure Synapse Dedicated SQL Pool?

Question

Documentation from Microsoft and others strongly emphasizes the separation between storage and compute in Azure Synapse Analytics. In the case of a Serverless SQL pool, it is clearly explained that the data is stored in an Azure Data Lake DSL Gen2. However, in the case of a Dedicated SQL Pool, the documentation is not explicit enough on data storage.

In a book that deals with Azure Synapse, it is stated that in the case of Dedicated SQL Pool, data is stored in Storage Nodes which are completely separate from Compute Nodes. Since this claim is not in Microsoft's documentation, I dare not trust it.

So, is there an official resource that sheds light on this question?

score 3 · Answer 1 · answered Jun 30 '22 at 13:10

This is a question that has been on my mind for a long time as well. However, I have come to the conclusion that data is actually stored in Dedicated SQL Pools.

Let me explain why I believe this.

Take a look at the documentation given here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/quickstart-copy-activity-load-sql-pool

Notice that it is about loading data into a Dedicated SQL Pool. Further, to quote part of the documentation,

A dedicated SQL pool offers T-SQL based compute and storage capabilities. After creating a dedicated SQL pool in your Synapse workspace, data can be loaded, modeled, processed, and delivered for faster analytic insight.

It is said that Dedicated SQL Pools provide both compute and storage capabilities.

Furthermore, with Dedicated SQL Pools, you may already know that it is possible to create traditional tables. We can organize these tables into something along the lines of a star or snowflake schema to model our data warehouses.

Creation of such tables, however, is not possible with Serverless SQL Pools. Only the creation of metadata objects, i.e. views or external tables are allowed. This is explained here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/on-demand-workspace-overview

To quote the relevant passage of the article,

Serverless SQL pool has no local storage, only metadata objects are stored in databases. Therefore, T-SQL related to the following concepts isn't supported:

Tables Triggers Materialized views DDL statements other than ones related to views and security DML statements

To me, the fact that tables can actually be created in Dedicated SQL Pools is further proof that the data is physically stored in them.

My final argument is around the idea of distributions. The concept is explained here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture

This talks about how data is divided up among the compute nodes and how queries are executed in parallel on the distributions in these nodes. It would not be possible to implement this if the data was not actually stored in these nodes.

In my humble opinion, how I believe Azure Storage comes into the picture (at least, when it comes to Dedicated SQL Pools) is with regards to storing data as files in a data lake and then ingesting them into the pool for analysis.

An explanation can be found here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/overview-architecture

Yet another quote,

Serverless SQL pool allows you to query your data lake files, while dedicated SQL pool allows you to query and ingest data from your data lake files. When data is ingested into dedicated SQL pool, the data is sharded into distributions to optimize the performance of the system.

This is where Polybase comes into play. You can define various data loading patterns (into Dedicated SQL Pools) using Polybase as explained here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/load-data-overview

score 1 · Answer 2 · answered Jan 27 '23 at 16:30

The Microsoft documentation on Design tables using dedicated SQL pool in Azure Synapse Analytics, found at https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-overview, states the following:

Table persistence: Tables store data either permanently in Azure Storage, temporarily in Azure Storage, or in a data store external to dedicated SQL pool.

Regular table A regular table stores data in Azure Storage as part of dedicated SQL pool...

Where is data physically stored in Azure Synapse Dedicated SQL Pool?

2 Answers2