2

I am copy data from SQL Server to Oracle with a SSIS package, and I want to improve performance of copying with incremental updating.

My idea is to :

  1. Append new records of table.(delete them before then append)
  2. Delete "Deleted" records

Code:

 drop table A1
 drop table A2

 --Old Table
 create Table A1(id int,
                 a1 int, a2 int, a3 int, a4 int, a5 int, a6 int, a7 int, a8 int)

 --New Table
 create Table A2(id int,
                 a1 int, a2 int, a3 int, a4 int, a5 int, a6 int, a7 int, a8 int)

 insert into A1 (id, a1, a2, a3, a4, a5, a6, a7, a8) 
 values (1,1,2,3,4,5,6,7,8), (2,2,2,5,4,2,6,2,8), (3,3,2,5,4,5,4,3,8),
        (4,3,1,3,4,5,7,3,8)

 insert into A2 (id, a1, a2, a3, a4, a5, a6, a7, a8) 
 values (1,2,2,3,4,5,6,7,8), (2,2,2,5,4,2,6,2,8), (3,3,2,5,4,5,4,3,8)
--1) Changed values  

 select * from a2
 except
 select * from a1;

 --2)  "Deleted" Values. If I Have primary key in A1,A2 the result is simply get by query
 select a1.* 
 from A1 
 left join A2 on a2.id = a1.id 
 where a2.id is null

But I don't have a primary key on my tables in my real situation. How I can achieve 2 point by another way, without join by primary key ?

Result of query:

id  a1  a2  a3  a4  a5  a6  a7  a8
1   2   2   3   4   5   6   7   8


id  a1  a2  a3  a4  a5  a6  a7  a8
4   3   1   3   4   5   7   3   8
Vitaly Ascheulov
  • 182
  • 2
  • 15

2 Answers2

3

By using MERGE statement you can do INSERT, UPDATE & DELETE in a single operation and it is ideal for delta load.

Sample code:

    MERGE A2 AS target
    USING (SELECT * FROM A1) AS source 
    ON (target.Id = source.Id AND target.a1 = Source.a1...) -- More join condition
    WHEN MATCHED THEN 
        UPDATE SET a2 = source.a2, a3 = source.a3 ------ more columns
WHEN NOT MATCHED THEN
    INSERT (Id, A1, a1, a2....)
    VALUES (source.Id, source.a1, source.a2.....) 

WHEN NOT MATCHED BY SOURCE
        THEN DELETE 
END;

MERGE statement also provides flexibility to capture inserted, deleted and updated records (in case, you need them to logging purpose) but I don't recommend them until you really need that information.

In above query you can keep the columns you want to compare

ON (target.Id = source.Id AND target.a1 = Source.a1...) -- More join

In SSIS

You can use LOOKUP transformation to find match and no match records and then insert or delete records. Any any case, if you are considering all columns in comparing source table with destination and update will make no sense.

Anuj Tripathi
  • 2,251
  • 14
  • 18
  • Merge is a good tool. Its works if both source and destinations is in SQL Server. But my source is SQL Server, and destination is Oracle. And I dont have primary key in tables. Merge uses join condition. My view is to transfer rows through data flow that need to append and delete. And then in Oracle do this action – Vitaly Ascheulov Oct 13 '15 at 08:26
  • @VitalyAscheulov How you are connecting to Oracle? I believe you would be having linked server connection in that case you should be able to use (**MERGE ORALCELinkedServer.A2 AS target**) – Anuj Tripathi Oct 13 '15 at 08:31
  • I'm connection through ADO.NET provider. Data flow with source - Sql server, destination - Oracle [Picture with dataflow](http://postimg.org/image/f3v55lqtl/) – Vitaly Ascheulov Oct 13 '15 at 08:35
  • As you don't have primary column then you have to include all the columns in MERGE statement **ON (target.Id = source.Id AND target.a1 = Source.a1...)** & if you are considering all columns then UPDATE makes no sense. – Anuj Tripathi Oct 13 '15 at 08:36
0

You can use CDC or other log shipping mechanism that are based on log files instead of queries. If you still want to use SSIS with queries -

Have you tried working with rowversion on SQL Server to find differences, it will be faster than except for sure.

select * from a2 where rowversion > @lastmaxrowversion;

To find deletion is an hard task, the best you can do is

select a1.id from a1 where not exists (select top 1 1 from a2 where a1.id = a2.id);

If you can add a for delete trigger and keep the ids in a table that will be quite good as well.

  • Keep in mind that many companies don't really care about deleted records in their ETL process, as storage is no longer a case. Also, Nowadays you can design your system with no deletes with partition table archiving or an active flag. Anyways, if deletion is still important, maybe try to make different intervals for insert/update vs delete. For example, every 10 minutes you check for insert/update where delete every 24 hour.
Johns
  • 31
  • 2