Azure Data Warehouse Insert to Huge Table

Question

The agreed pattern to insert new data into an already existing table in Azure Data Warehouse seems to be...

create table dbo.MyTable_New
with (distribution = round_robin)
as

select
    Col1
    ,Col2
from dbo.MyTable

union all

select
    Col1
    ,Col2
from dbo.MyNewStuff;

Now, what we are seeing is that on really large tables this will degrade over time, and it's obvious why you are reading everything you have already, and re-inserting it.. this seems sub-optimal to me...

Does anyone have any alternative approaches that they have seen work for them, i am thinking of things like partition switching for example...

score 2 · Answer 1 · answered Jul 07 '17 at 11:15

Agreed with who? CTAS can be an effective option, but it really depends on your data volumes, SLAs, business processes etc. If you are inserting small volumes into an already large table for example, I'm not sure CTAS makes sense, why not INSERT? You can use partition switching in Azure SQL Data Warehouse (eg here). Other options include Azure Data Factory, SSIS (which now supports Azure SQL Data Warehouse), Polybase, plain INSERT etc

Please review the following article which offers a number of strategies for loading data into Azure SQL Data Warehouse:

https://blogs.msdn.microsoft.com/sqlcat/2017/05/17/azure-sql-data-warehouse-loading-patterns-and-strategies/

lol, good point on the insert into.. that is relatively new syntax, you always used to need CTAS :) — m1nkeh, Jul 07 '17 at 11:50
`SELECT ... INTO` is new syntax. Plain old `INSERT` (along with `DELETE` and `UPDATE`) has always been there. — wBob, Jul 07 '17 at 12:05

Azure Data Warehouse Insert to Huge Table

1 Answers1