-1

We have a denormalized table in our SQL Server data model that we want to migrate to a normalized format. However, the query we have created for this migration is causing a significant delay. We have checked the actual execution plan and found that the estimated row count is much higher than the actual row count.

Old Table Data Model:

Column A (PK Column): Long
Column1: String
Column2: String
Column3: String

New table data model:

Column A (PK Column): Long
Column1: Long (FK value by referencing another table)
Column2: Long (FK value by referencing another table)
Column3: Long (FK value by referencing another table)
insert into newTable (pkcolumn, column1, column2, column3)
select 
    pkColumn,
    anothertable1.id,
    anothertable2.id,
    anothertable3.id
left join 
    anothertable as anothertable1 on anothertable1.field_id = 1 
                                  and anothertable1.value = oldtable.column1 
                                  and oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10)
left join 
    anothertable as anothertable2 on anothertable2.field_id = 2 
                                  and anothertable2.value = oldtable.column2 
                                  and oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10)
left join 
    anothertable as anothertable3 on anothertable3.field_id = 3 
                                  and anothertable3.value = oldtable.column3 
                                  and oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10)

Batch size: oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10....1000)

Problem

The query we have created for the migration from the denormalized to the normalized table is causing a significant delay. We have checked the actual execution plan and found that the estimated row count is much higher than the actual row count. We have already updated the statistics of the tables involved in the query and rebuilt the indexes of the column used in the query.

Question

Is there any way to increase the performance of the migration in this scenario?

Additional information:

Query Plan: Link

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • How large are your denormalized source data tables in terms of number of rows? Do they have the appropriate indexes to run the join query you have? Also make sure you have statistics run on the tables - that will ensure that the optimizer will choose the proper query execution plan – Sumit S May 15 '23 at 03:37
  • Please share the query plan via https://brentozar.com/pastetheplan Please also show all indexes on relevant tables – Charlieface May 15 '23 at 03:41
  • The giant `in` is probably what is causing the issue. Use a table variable or temp table or TVP containing the IDs to join onto, and make sure that's indexed with a primary key. – Charlieface May 15 '23 at 03:43
  • @SumitS Our denormalized data is with 250 columns and 5L row data – NaveenPrabu May 15 '23 at 05:32
  • Long does not exists as a datatype in MS SQL Server. Please post the complet DDL of your table (the old one and the new one) including indexes (CREATE TABLE ... CREATE INDEX...). – SQLpro May 15 '23 at 07:18
  • What gives an INSERT with JOIN on only field_id = 1 followed by 2 MERGE for the 2 others ? (You may need to also INSERT the ones that don't have any JOINed rows, in case you have rows with that condition...). You need to look at your data and see how many have 0, 1, 2 and 3 JOINs, that may help finding the optimal solution. – p3consulting May 15 '23 at 07:23
  • Your query isn't even legal, you don't have a FROM clause. Why do you also add the IN to the LEFT JOIN part, it should be part of the WHERE. With these problems, the performance is least of your issues – siggemannen May 15 '23 at 09:09

2 Answers2

0

Based on your answer that the denormalized tables have 250 columns and 0.5 million rows each, in my opinion any index design and trying to do it in a single SQL statement will be difficult. So your options are

  1. Run it in programmatic batch way in SQL - with proper index and statistics if it is managable and source data is static

  2. This is my choice - use an ETL tool - to do this - or even use spark/databricks or if this is in cloud any cloud based tool

Sumit S
  • 516
  • 5
  • 17
0

First thing, you can create FK after having migrates the data. This will reduce the speed.

Second, you can rewrite your JOINs as :

ON anothertable1.field_id = 1 
   and anothertable1.value = oldtable.column1 
   and oldtable.pkColumn BETWEEN 1 AND 10

But I did not see any normalization of your table.

This seems to me like this post :

Best index for table with dynamic predicate Which I already answer

SQLpro
  • 3,994
  • 1
  • 6
  • 14