4

If I run a CREATE EXTERNAL TABLE cetasTable AS SELECT command then run:

EXPLAIN
select * from cetasTable

I see in the distributed query plan:

<operation_cost cost="4231.099968" accumulative_cost="4231.099968" average_rowsize="2056" output_rows="428735" />

It seems to know the correct row count, however, if I look there are no statistics created on that table as this query returns zero rows:

select * from sys.stats where object_id = object_id('cetasTable')

If I already have files in blob storage and I run a CREATE EXTERNAL TABLE cetTable command then run:

EXPLAIN
select * from cetTable 

The distributed query plan shows SQL DW thinks there are only 1000 rows in the external table:

  <operation_cost cost="4.512" accumulative_cost="4.512" average_rowsize="940" output_rows="1000" />

Of course I can create statistics to ensure SQL DW knows the right row count when it creates the distributed query plan. But can someone explain how it knows the correct row count some of the time and where that correct row count is stored?

GregGalloway
  • 11,355
  • 3
  • 16
  • 47

1 Answers1

4

What you are seeing is the difference between a table created using CxTAS (CTAS, CETAS or CRTAS) and CREATE TABLE.

When you run CREATE TABLE row count and page count values are fixed as the table is empty. If memory serves the fixed values are 1000 rows and 100 pages. When you create a table with CTAS they are not fixed. The actual values are known to the CTAS command as it has just created and populated the table in a single command. Consequently, the metadata correctly reflects the table SIZE when a CxTAS is used. This is good. The APS / SQLDW cost based optimizer can immediately make better estimations for MPP plan generation based on table SIZE when a table has been created via CxTAS as opposed to CREATE table.

Having an accurate understanding of table size is important.

Imagine you have a table created using CREATE TABLE and then 1 billion rows are inserted using INSERT into said table. The shell database still thinks that the table has 1000 rows and 100 pages. However, this is clearly not the case. The reason for this is because the table size attributes are not automatically updated at this time.

Now imagine that a query is fired that requires data movement on this table. Things may begin to go awry. You are now more likely to see the engine make poor MPP plan choices (typically using BROADCAST rather than SHUFFLE) as it does not understand the table size amongst other things.

What can you do to improve this?

You create at least one column level statistics object per table. Generally speaking you will create statistics objects on all columns used in JOINS, GROUP BYs, WHEREs and ORDER BYs in your queries. I will explain the underlying process for statistics generation in a moment. I just want to emphasise that the call to action here is to ensure that you create and maintain your statistics objects.

When CREATE STATISTICS is executed for a column three events actually occur.

1) Table level information is updated on the CONTROL node

2) Column level statistics object is created on every distribution on the COMPUTE nodes

3) Column level statistics object is created and updated on the CONTROL node

1) Table level information is updated on the CONTROL node

The first step is to update the table level information. To do this APS / SQLDW executes DBCC SHOW_STATISTICS (table_name) WITH STAT_STREAM against every physical distribution; merging the results and storing them in the catalog metadata of the shell database. Row count is held on sys.partitions and page count is held on sys.allocation_units. Sys.partitions is visible to you in both SQLDW and APS. However, sys.allocation_units is not visible to the end user at this time. I referenced the location for those familiar with the internals of SQL Server for information and context.

At the end of this stage the metadata held in the shell database on the CONTROL node has been updated for both row count and page count. There is now no difference between a table created by CREATE TABLE and a CTAS - both know the size.

2) Column level statistics object is created on every distribution on the COMPUTE nodes

The statistics object must be created in every distribution on every COMPUTE node. By creating a statistics object important, detailed statistical data (notably the histogram and the density vector) for the column has been created.

This information is used by APS and SQLDW for generating distribution level SMP plans. SMP plans are used by APS / SQLDW in the PHYSICAL layer only. Therefore, at this point the statistical data is not in a location that can be used for generating MPP plans. The information is distributed and not accessible in a timely fashion for cost based optimisation. Therefore a third step is necessary...

3) Column level statistics object is created and updated on the CONTROL node

Once the data is created PHYSICALLY on the distributions in the COMPUTE layer it must be brought together and held LOGICALLY to facilitate MPP plan cost based optimisation. The shell database on the CONTROL node also creates a statistics object. This is a LOGICAL representation of the statistics object.

However, the shell database stat does not yet reflect the column level statistical information held PHYSICALLY in the distributions on the COMPUTE nodes. Consequently, the statistics object in the shell database on the CONTROL node needs to be UPDATED immediately after it has been created.

DBCC SHOW_STATISTICS (table_name, stat_name) WITH STAT_STREAM is used to do this.

Notice that the command has a second parameter. This changes the result set; providing APS / SQLDW with all the information required to build a LOGICAL view of the statistics object for that column.

I hope this goes some way to explaining what you were seeing but also how statistics are created and why they are important for Azure SQL DW and for APS.

Rob Farley
  • 15,625
  • 5
  • 44
  • 58
JRJ
  • 1,704
  • 7
  • 6