I am using a try catch block to catch the data with constrain errors. eg. if null inserted in not null column or if duplicated record inserted or the type mismatch occurs, all the source records with error should go to error log table and rest of the records should go to destination table. for this i am using try catch so i can't use bulk insert, hence using row by row insert using While Loop, which is takes forever to run as i have to insert 3000000 records. is there any way where i can improve performance while loop? so it can insert 3000000 records in minimum time? currently it is taking 2 hours or more :(
Asked
Active
Viewed 596 times
-2
-
3Why are you using a loop to insert rows? This is your main issue to your bad performance. Create a Stored Procedure with try catch and then insert all at once. Im pretty sure even if it fails you should write some kinda logic that handles if there comes null in a not null column – SqlKindaGuy Nov 09 '17 at 11:19
-
Im not even sure you can handle output to different table in begin try/catch statement hence its transaction based. So i would figure out some logic to avoid this problem.Cant you just write where ID is not null and insert to your main table. And write where ID is null and insert to your error_log – SqlKindaGuy Nov 09 '17 at 11:37
-
hi, problem is if any records which has any kind of error in it should move to error log table, if i simply use try catch if one of the record has some error the whole batch will get moved to catch part which is to insert records in error table, so i have to test each row and move the records accordingly. looking for a way where i can use some fast approach for the same. thank you :) – Dilip Chauhan Nov 09 '17 at 11:37
-
I don't see that approach as reasonable **IF** you do knows before hand what case will cause to raise exceptions. **IF** you can select only the valid (not null values for not null columns) rows only insert them on destination and insert the invalid rows in the log using dataset logic and not a **RBAR** procedural one – jean Nov 09 '17 at 15:57
-
It's helpful to tag database questions with both the appropriate software (MySQL, Oracle, DB2, ...) and version, e.g. `sql-server-2014`. Differences in syntax and features often affect the answers. Note that `tsql` narrows the choices, but does not specify the database. – HABO Nov 10 '17 at 20:04
2 Answers
0
Try conducting your insert in batches. For instance do a loop attempting to insert 10,000/1,000/100 records at a time as a bulk insert. If there is an error in the batch, catch it and re-execute that batch as a row by row operation. You will have to play with the batch size and make it small enough so that the majority of batches are processed as bulk inserts and only have to row by row a batch occasionally.

Hunter Nelson
- 1,707
- 3
- 20
- 37
-
1An alternative to switching immediately to row-by-row in the presence of an error is to split the batch into halves and reprocess them in an effort to locate the problem row via a binary search. The performance will depend on the distribution of bad rows and the initial batch size. – HABO Nov 09 '17 at 19:54
0
The following demonstrates handling a pile of sample data in batches with "binary search" on the batch size in the event of an error.
set nocount on;
-- Set the processing parameters.
declare @InitialBatchSize as Int = 1024;
declare @BatchSize as Int = @InitialBatchSize;
-- Create some sample data with somewhat random Divisor values.
declare @RowsToProcess as Int = 10000;
declare @SampleData as Table ( Number Int, Divisor Int );
with Digits as ( select Digit from ( values (0), (1), (2), (3), (4), (5), (6), (7), (8), (9) ) as Digits( Digit ) ),
Numbers as (
select ( ( ( Ten_4.Digit * 10 + Ten_3.Digit ) * 10 + Ten_2.Digit ) * 10 + Ten_1.Digit ) * 10 + Ten_0.Digit + 1 as Number
from Digits as Ten_0 cross join Digits as Ten_1 cross join Digits as Ten_2 cross join
Digits as Ten_3 cross join Digits as Ten_4 )
insert into @SampleData
select Number, Abs( Checksum( NewId() ) ) % 1000 as Divisor -- Adjust "1000" to vary the chances of a zero divisor.
from Numbers
where Number < @RowsToProcess;
-- Process the data.
declare @FailedRows as Table ( Number Int, Divisor Int, ErrorMessage NVarChar(2048) );
declare @BitBucket as Table ( Number Int, Divisor Int, Quotient Int );
declare @RowCount as Int = 1; -- Force at least one loop execution.
declare @LastProcessedNumber as Int = 0;
while @RowCount > 0
begin
begin try
-- Subject-to-failure INSERT .
insert into @BitBucket
select top ( @BatchSize ) Number, Divisor, 1 / Divisor as Quotient
from @SampleData
where Number > @LastProcessedNumber
order by Number;
set @RowCount = @@RowCount;
select @LastProcessedNumber = Max( Number ) from @BitBucket;
print 'Processed ' + Cast( @RowCount as VarChar(10) ) + ' rows.';
end try
begin catch
if @BatchSize > 1
begin
-- Try a smaller batch.
set @BatchSize /= 2;
end
else
begin
-- This is a failing row. Log it with the error and reset the batch size.
set @LastProcessedNumber += 1;
print 'Row failed. Row number ' + Cast( @LastProcessedNumber as VarChar(10) ) + ', error: ' + Error_Message() + '.';
insert into @FailedRows
select Number, Divisor, Error_Message()
from @SampleData
where Number = @LastProcessedNumber;
set @BatchSize = @InitialBatchSize;
end
end catch
end;
-- Dump the results.
select * from @FailedRows order by Number;
select * from @SampleData order by Number;
select * from @BitBucket order by Number;

HABO
- 15,314
- 5
- 39
- 57