SQL Loop through 8 million record and update them

Question

I have a audit table that has about 8 million records. I have recently added two new column which I need to update from existing column with some rules/conditions. Basically initially, whenever a FK was updated in a table, it was storing old and new FK ids into the audit table. for example

Table A
ID      Name
1       First A
2       Second A
3       Third A

Table B
ID   AID  Name
1     1   First B
2     1   Second B
3     2   Third B

Audit 
ID  TableName FieldName OldValue NewValue

now if i update first record of the table B from 1 1 First B to 1 3 First B then the audit table will store the change as

Audit 
ID  TableName FieldName OldValue NewValue 
1    Table B  AID        1          3

Now I have updated Audit table to store actual Text value of the FK i.e above change will be stored as

Audit 
ID  TableName FieldName OldValue  NewValue  OldText   NewText
1    Table B   AID       1         3        First A   Third A

The problem is I already have about 8 million records that I need to new columns for. I have written below query to do that

declare @sql nvarchar(max);
declare @start int = 1

while @start <= 8000000 
begin
  select top 10000  @sql = COALESCE(@sql+'Update Audit set ','Update Audit set') + 
    isnull(' OldText = ('+ dbo.GetFKText(i.TableName, i.FieldName)+case when len(isnull(i.OldValue,'')) < 1 then null else i.OldValue end +'),',' OldText = OldValue, ')  + 
    isnull(' NewText = ('+ dbo.GetFKText(i.TableName, i.FieldName)+case when len(isnull(i.NewValue,'')) < 1 then null else i.NewValue end +')',' NewText = NewValue ')    +  
    ' where AuditID = '+cast(i.AuditID as nvarchar(200))+' and lower(ltrim(rtrim(TableName))) <> ''audit'';' 
    from Audit i where i.AuditID >= @start 

  exec sp_executesql @sql

  set @start = @start+10000;
end

get text function (basically I getting column that has name = (TableName)+'Name' or (TablName)+(SomeText)+'Name' this just a convention that I have followed in all the tables)

     declare @res nvarchar(max)=''; 
     declare @fn nvarchar(200);
     declare @ttn nvarchar(200);
     declare @tcn nvarchar(200);

     SELECT top 1
         @ttn = kcu.table_name  
        ,@tcn = kcu.column_name  
     FROM INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu
     INNER JOIN INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS rc
        ON ccu.CONSTRAINT_NAME = rc.CONSTRAINT_NAME 
     INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu 
        ON kcu.CONSTRAINT_NAME = rc.UNIQUE_CONSTRAINT_NAME  
     Where ccu.TABLE_NAME = @TableName and  ccu.COLUMN_NAME = @FieldName

     if isnull(@ttn,'') != '' and ISNULL(@tcn,'') != ''
     begin
       select @fn= COLUMN_NAME 
       from (SELECT top 1 COLUMN_NAME , 
                   case when COLUMN_NAME like (@ttn+'Name') then 0 
                   when COLUMN_NAME like (@ttn+'%Name') then 1 
                   when COLUMN_NAME like (@ttn+'Code') then 2 
                   when COLUMN_NAME like (@ttn+'%Code') then 3 else 4 end as CPriority
             FROM JVO.INFORMATION_SCHEMA.COLUMNS
             WHERE TABLE_NAME = @ttn and (COLUMN_NAME like '%Name' or COLUMN_NAME like '%Code'
            )
       order by CPriority) as aa; 

       RETURN 'select '+@fn+' from '+@ttn+' where '+@tcn+' = ';
     end

 return null;

Its working but really slow, it update about 1 million records in 13 hours. can anyone help to improve this query or suggest alternative way to update it.

Thanks

It appears that you are not clearing @SQL between loop iterations. You are build 10000 statements, executing them, and then appending 10000 more, executing those 20000, and so on. — David Dubois, May 04 '18 at 01:28
Also you are using TOP without ORDER BY. There is no way of knowing which 10000 records you are processing in each loop iteration. — David Dubois, May 04 '18 at 01:30
To help understand, why not try something simpler that you can test. Make the loop while <= 100, and use TOP 5 and +5. After the SELECT, print @sql, and print len(@sql). Watch the messages as the statement is executed. How many "1 row affected" messages are being generated in each iteration. Add +char(13) to the end of the SELECT so that you can see all the statements easily when printed. — David Dubois, May 04 '18 at 01:42
I have no idea what the context switching costs in SQLServer. But you are dynamically building a sql string, then executing the string. Repeating 8 million times. It could be that having one part build all the sql statements first and putting them to a file (i.e. not executing), massage the file (break into parts, add commits, ...), then running the script(s) (i.e. not as dynamic sql) is more efficient and less error prone. Or another approach: building a csv to bulk load -but that would depend on how many distinct tables and columns are being dealt with. — Glenn, May 04 '18 at 02:05
thanks, David for finding bug and Glenn for another way of doing it. since the script is already there, I will first just update the script fixing what David has pointed out. I think this will probably fix it. if this still takes long then will just create the script save it in file and then run it in batch. — Bhavesh, May 04 '18 at 06:09
Did you consider creating a copy of the table and add the new columns while doing so (e.g. select ..., case...end AS newcol1, case...END AS newcol2 INTO newTable FROM oldTable), then drop the old table and rename the new table to the old tables name? requires a recreation of Indices etc, but might work. Another approach would be to truncate the old table after copying it (see above) and then bulk insert the copied records to it... — Tyron78, May 04 '18 at 08:24
Tyron78 : not sure how would it be possible with this as part of the query which is sub query to get Name column of foreign table is generated dynamically based on value. any idea? — Bhavesh, May 06 '18 at 11:12

SQL Loop through 8 million record and update them

0 Answers0