This is a question that has been on my mind for a long time as well. However, I have come to the conclusion that data is actually stored in Dedicated SQL Pools.
Let me explain why I believe this.
Take a look at the documentation given here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/quickstart-copy-activity-load-sql-pool
Notice that it is about loading data into a Dedicated SQL Pool. Further, to quote part of the documentation,
A dedicated SQL pool offers T-SQL based compute and storage
capabilities. After creating a dedicated SQL pool in your Synapse
workspace, data can be loaded, modeled, processed, and delivered for
faster analytic insight.
It is said that Dedicated SQL Pools provide both compute and storage capabilities.
Furthermore, with Dedicated SQL Pools, you may already know that it is possible to create traditional tables. We can organize these tables into something along the lines of a star or snowflake schema to model our data warehouses.
Creation of such tables, however, is not possible with Serverless SQL Pools. Only the creation of metadata objects, i.e. views or external tables are allowed. This is explained here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/on-demand-workspace-overview
To quote the relevant passage of the article,
Serverless SQL pool has no local storage, only metadata objects are
stored in databases. Therefore, T-SQL related to the following
concepts isn't supported:
Tables Triggers Materialized views DDL statements other than ones
related to views and security DML statements
To me, the fact that tables can actually be created in Dedicated SQL Pools is further proof that the data is physically stored in them.
My final argument is around the idea of distributions. The concept is explained here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture
This talks about how data is divided up among the compute nodes and how queries are executed in parallel on the distributions in these nodes. It would not be possible to implement this if the data was not actually stored in these nodes.
In my humble opinion, how I believe Azure Storage comes into the picture (at least, when it comes to Dedicated SQL Pools) is with regards to storing data as files in a data lake and then ingesting them into the pool for analysis.
An explanation can be found here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/overview-architecture
Yet another quote,
Serverless SQL pool allows you to query your data lake files, while
dedicated SQL pool allows you to query and ingest data from your data
lake files. When data is ingested into dedicated SQL pool, the data is
sharded into distributions to optimize the performance of the system.
This is where Polybase comes into play. You can define various data loading patterns (into Dedicated SQL Pools) using Polybase as explained here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/load-data-overview