0

I have imported over 400 million records to a dummy dimension table. I need to take my existing fact table and join to the dummy dimension to perform an update on the fact table. To avoid filling up the transaction logs, somebody suggested I perform a loop to update these records instead of performing an update to hundreds of millions of records at once. I have research loops and researched using a Wait For and Delays, but I am not for sure the best approach on writing the logic out.

Here is the sample update I need to perform:

Update f 
set f.value_key = r.value_key
FROM [dbo].[FACT_Table] f 
INNER JOIN dbo.dummy_table r ON f.some_key = r.some_Key
and r.calendar_key = f.calendar_key
WHERE f.date_Key > 20130101
AND f.date_key < 20141201
AND f.diff_key = 17

If anybody has a suggestion on the best way to write I would really appreciate it.

smcdonald
  • 34
  • 5
  • Can you just use that "some_key" there? It seems to be a date, so you could run the update for example for 7 days at the time, then sleep and run the next 7 days? – James Z Feb 26 '15 at 15:34
  • possible duplicate of [Update query on millions of rows fills the transaction log](http://stackoverflow.com/questions/3083733/update-query-on-millions-of-rows-fills-the-transaction-log) – Andrew Feb 26 '15 at 15:54
  • Perhaps my answer here may help? http://stackoverflow.com/questions/28347233/set-value-to-a-new-datetime-column-in-a-table-with-over-5-million-rows/28347697#28347697 – Elliveny Feb 26 '15 at 16:00

4 Answers4

0

To avoid filling up the transaction log you CAN set your recovery model to SIMPLE on your dev machines - that will prevent transaction log bloating when tran log backups aren't done.

ALTER DATABASE MyDB SET RECOVERY SIMPLE;

If you want to perform your update faster use hint ie., (tablock).

StackUser
  • 5,370
  • 2
  • 24
  • 44
0

Please don't do what the previous person suggested unless you really understand what else will happen. The most important result is that you lose the ability to do a point in time recovery. If you do a full recovery every night and a transaction log recovery every hour (or every 15 minutes) switching to a simple recovery model breaks the chain and you can only recover to the last full recovery time.

If you do it you have to do it (switch to simple) switch back to full and do a full backup and then switch back to doing log backups on a schedule. When the previous person suggest is like driving without bumpers to save on car weight. Sounds great until you hit something.

benjamin moskovits
  • 5,261
  • 1
  • 12
  • 22
0
-- Run this section of code one time to create a global queue containing only the columns needed to identify a unique row to process. Update the SELECT statement as necessary.

IF OBJECT_ID('Tempdb.dbo.##GlobalQueue') IS NOT NULL
        DROP TABLE ##GlobalQueue

SELECT diff_key, date_key INTO ##GlobalQueue FROM [dbo].[FACT_Table] f 
INNER JOIN dbo.dummy_table r ON f.some_key = r.some_Key
and r.calendar_key = f.calendar_key
WHERE f.date_Key > 20130101
AND f.date_key < 20141201
AND f.diff_key = 17


-- Copy/paste the SQL below to run from multiple sessions/clients if you want to process things faster than the single session. Start with 1 session and move up from there as needed to ramp up processing load.

WHILE 1 = 1
    BEGIN
            DELETE TOP ( 10000 )  --Feel free to update to a higher number if desired depending on how big of a 'bite' you want it to take each time.
                    ##Queue WITH ( READPAST )
            OUTPUT  Deleted.*
                    INTO #RowsToProcess
            IF @@ROWCOUNT > 0 
            BEGIN
                 Update f 
                 set f.value_key = r.value_key
                 FROM [dbo].[FACT_Table] f
                 INNER JOIN dbo.dummy_table r ON f.some_key = r.some_Key
                 INNER JOIN #RowsToProcess RTP ON r.some_key = RTP.diff_key
                 and f.calendar_key = RTP.calendar_key 

                 DELETE  FROM #RowsToProcess

                 --WAITFOR DELAY '00:01' --(Commented out 1 minute delay as running multiple sessions to process the records eliminates the need for that). Here just for demonstration purposes as it may be useful in other situations.
            END
            ELSE
                 BREAK
    END
0

use top and <>
in an update you must use top (#)

while 1 = 1
    begin
      update top (100) [test].[dbo].[Table_1] 
         set lname =  'bname' 
       where lname <> 'bname'
      if @@ROWCOUNT = 0 break
    end

while 1 = 1
    begin
      Update top (100000) f 
         set f.value_key = r.value_key
        FROM [dbo].[FACT_Table] f 
       INNER JOIN dbo.dummy_table r 
          ON f.some_key = r.some_Key
         and r.calendar_key = f.calendar_key
         AND f.date_Key > 20130101
         AND f.date_key < 20141201
         AND f.diff_key = 17
         AND f.value_key <> r.value_key
      if @@ROWCOUNT = 0 break
    end
paparazzo
  • 44,497
  • 23
  • 105
  • 176