3

My use case is to accept Azure Synapse DW DDL as input, parse it and extract SQL statements from it. Additionally I can benefit from not only the base statements (viz, DECLARE, CREATE, ALTER, SELECT, ...) but also other DDL constructs like PK constraints, Indexing, Distribution, partitioning etc.

Reaching out to understand if there is a known way to achieve this? Much thanks in advance.

Things I've tried:

  1. TsqlParser (all 8 available versions 80,90,100,110,120,130,140,150) but it fails to parse the DDL and errors out on few tokens which are specific to DW.

  2. Passed a custom TSqlConcreteFragmentVisitor to TSqlFragment but it didn't detect Create table statements either (It could do a visit on SELECT statements though, not shown in attached DDL here)

  3. Used Microsoft.SqlServer.Management.SqlParser.Parser.Scanner to read SQL batches from the DDL but it identified the whole DDL as a single batch. This is not helpful as I am essentially getting the same output as the input I passed in.

  4. Used TSQL.TSQLStatementReader to tokenise the DDL but it gives individual tokens and not SQL statements (For example, in fiddle here, PK constraint NOT ENFORCED is tokenised as 2 tokens and not a single constraint as it should be.

Pls make a note that if I remove DW specific constructs like PK constraint NOT ENFORCED and WITH (...) block manually, then TsqlParser successfully parses the DDL and correctly brings out Create statements with tables, columns and datatypes.

Sample DDL:

IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name = N'tempschema_main')

EXEC('CREATE SCHEMA [schema_main]')

CREATE TABLE [schema_main].[temp_table]

(
    [TempId1] NVARCHAR(128) NOT NULL,

    [TempId2] NVARCHAR(128) NOT NULL,

    CONSTRAINT PK_TempPkColumnId PRIMARY KEY NONCLUSTERED (TempId1, TempId2) NOT ENFORCED
)

WITH 
(
    CLUSTERED COLUMNSTORE INDEX,

    DISTRIBUTION = ROUND_ROBIN
)
Dale K
  • 25,246
  • 15
  • 42
  • 71

1 Answers1

0

As you are probably discovering, SQL parsing is not simple - given the number of different ways there are of writing the same statement, the number of options there are available within a statement and the number of levels of statement there can be within a statement (i.e. selects within selects within selects), plus all the syntax that is not standard but is specific to a particular DBMS.

The options you have are probably as follows:

  1. You get lucky and the functionality you have available does what you want. From what you have stated in your question, this is not the case for you as the things you have tried don't work
  2. If all the DDL you want to process is structured in the same way, then it might be easier to write your own parser (javascript, VBA, etc) that does exactly what you want with just the data that you have
  3. Use a commercial SQL parser. The only one I'm aware of that really works is SQLParser. If you come across any syntax that doesn't parse properly with their tool then they are pretty quick at fixing it. There is a free version but that is (I think) limited to a couple of thousand characters in your SQL statement so you may need to pay for it
NickW
  • 8,430
  • 2
  • 6
  • 19