0

We are using dbt in combination with SQL Server 2019 and the usage of CTEs are a huge performance drag for us.

The main issue with the CTEs is, that they are deeply nested over several levels. E.g. CTE_L1 is refering to CTE_L2, CTE_L2 is referring to CTE_L3. The execution plan looks something like that and the same code is executed over and over again and most of the branches are including execution of the code of the lowest level over and over again:

ComplicatedExecutionPlan

Without dbt we would just use temporary tables in the temp db (#tbl_L1, #tbl_L2 etc) which then can be reused by lower levels. We tried that (w/o dbt) and it's much much faster than using the CTEs. Is there any way to utilize the temp db instead of CTEs? We are using that adapter for SQL server: https://github.com/dbt-msft/dbt-sqlserver

Materializing every level is not an option. The depth is dynamic and the number of tables is too high.

Chris
  • 124
  • 9
  • 2
    The main issues is that the query is far too complicated and the entire process needs rethinking. A CTE is little different from a VIEW. If you use `dbt` for ETL, focus on the *T* - the data gets transformed in flight, it's not loaded into intermediate tables. That doesn't mean writing 200 nested vies or CTES, it means processing the data in a pipeline, with each transformation processing one row at a time and forwarding the row to the next – Panagiotis Kanavos Jun 28 '23 at 08:07
  • 1
    The reason dbt tutorials show CTEs is that in Redshift, BigQuery etc the CTEs effectively become pipeline steps, unlike relational databases like SQL Server, MySQL, Oracle, PostgreSQL (actually all of them) where they're part of a single query. And when they don't, cloud databases will happily scale up to handle the extra load while charging you for it. – Panagiotis Kanavos Jun 28 '23 at 08:13

0 Answers0