I need to update a large table, roughly 3B rows. Each record needs to update 4 fields of decimal(18,2) datatype. I have recreated the table with the corrected data to update the original table. Below is the statement I have been testing. It works, but It takes about 30 minutes to update 6m records. This means that updating the Original table will take a very long time.
First, when I test this statement it starts off pretty fast but then slows down significantly as more records are updated. Why is this the case and what can I alter to speed up the process?
Second, I have been working on simplifying the joins and where clauses, but this was a pretty complex table to begin with. Based on the statement below, what else can I do to improve performance and shorten the time to update? For example, I know its standard to do batches in rows of 10k, but is it worthwhile to try more like 100k or even 1m at a time? Or would that cause some other unforseen issue like filling up logs disk space or filling up disk space in the temp directory?
use Stage
go
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
go
DECLARE @batchId INT
DECLARE @batchSize INT
DECLARE @results INT
SET @results = 1
SET @batchSize = 10000
SET @batchId = 0 -- when 0 rows returned, exit the loop
Drop table If Exists #tempupdate
select
Identity(int,1,1) as rowkey,
a.Rm835HeaderId
,a.Rm835ClaimHeaderId
,a.Rm835ClaimPmtId
,a.Rm835ClaimServiceLineId
,a.ClaimLineInd
,a.ClaimAdjustGroupCode
,a.RemitDate
,b.totalclaimchargeamt as TotalClaimChargeAmt_d
,b.ClaimPaymentAmt as ClaimPaymentAmt_d
,b.LineitemChargeamt as lineitemchargeamt_d
,b.Amount as amount_d
,(a.amount - b.amount) as difff
into #tempupdate
from [stage].[stage].[DetailRefreshtest]a
inner join [RealMedstage].[reporting].[DetailRefresh] b on
a.HeaderId = b.HeaderId
and a.ClaimHeaderId = b.ClaimHeaderId
and a.ClaimPmtId = b.ClaimPmtId
and a.ClaimServiceLineId = b.ClaimServiceLineId
and a.ClaimLineInd = b.ClaimLineInd
and a.ClaimAdjustGroupCode = b.ClaimAdjustGroupCode
where a.RemitDate between '2021-04-01 00:00:00.000'and '2021-04-02 00:00:00.000'
and b.RemitDate between '2021-04-01 00:00:00.000'and '2021-04-02 00:00:00.000'
and a.amount is not null and abs(a.amount - b.amount)<=.5
WHILE (@results > 0)
BEGIN
BEGIN TRY
BEGIN TRAN;
update [stage].[stage].[DetailRefreshtest]
set [TotalClaimChargeAmt] =b.TotalClaimChargeAmt_d
,[ClaimPaymentAmt] = b.ClaimPaymentAmt_d
,[LineItemChargeAmt] = b.lineitemchargeamt_d
,[Amount] = b.amount_d
,[opsinsertid]=0
from #tempupdate b
inner [stage].[stage].[DetailRefreshtest] a on
a.HeaderId = b.HeaderId
and a.ClaimHeaderId = b.ClaimHeaderId
and a.ClaimPmtId = b.ClaimPmtId
and a.ClaimServiceLineId = b.ClaimServiceLineId
and a.ClaimLineInd = b.ClaimLineInd
and a.ClaimAdjustGroupCode = b.ClaimAdjustGroupCode
where a.amount is not null and abs(b.difff)<=.5 --(a.amount - b.amount_d)<=.5 and (a.amount - b.amount_d)>=-.5
and (b.rowkey >@batchId and b.rowkey<= @batchId + @batchSize)
SET @results = @@ROWCOUNT
-- next batch
SET @batchId = @batchId + @batchSize
COMMIT TRAN;
END TRY
BEGIN CATCH
SELECT
ERROR_NUMBER() AS ErrorNumber,
ERROR_SEVERITY() AS ErrorSeverity,
ERROR_STATE() AS ErrorState,
ERROR_PROCEDURE() AS ErrorProcedure,
ERROR_LINE() AS ErrorLine,
ERROR_MESSAGE() AS ErrorMessage
rollback tran;
END CATCH
End