0

I have created an SSIS package to extract and load data from access table (.mdb) named TmShipping to a SQL table say, TmShippingImport, this SQL table has Id and importedDate as additional columns. I have scheduled the package to run for every 30 minutes.

TmShipping.mds
----------------------------------------
OrderId   CarrierId    TotalCharge


TmShippingImport (SQL table)
-------------------------------------------------------
Id   OrderId   CarrierId    TotalCharge     importedDate

In the Data Flow Task:

I am getting the data from source using OLD EB connection and extracting all row data from the access table and the output of this is connected to a Recordset Destination so that I can extract each row.

In the Control flow task:

I have a loop container (connected to data flow task's o/p) which inserts each row into the sql table by a sql query and loads all the row data along with the current datetime.

Package execution

The SSIS package when executed for the first time loads each row into the SQL Table and add a DateTime to imporatDatetime Column. When new records are created in the Access table, the package now takes all the rows in the MS access (rows that were extracted previously and new rows) and loads them in the SQL table again. How to avoid duplicates? My primary key in SQL table is ID which is not present in the MS Access Table.

Tried using Lookup table in the Dataflow in between source and recordset destination but it failed saying I can't connect the available column to BLOB.

Should I be trying with Lookup Merge in the dataflow or should I make changes in the foreachloop container so that it checks for duplicates before inserting the rows into the sql table or...?

C82
  • 61
  • 6
  • Does the Access table have a unique key at all? – Tab Alleman Feb 09 '18 at 14:32
  • Yes, OrderID is the unique key to the MS Access table. And Id is the primary key in SQL table. We can't insert a new column in the MS Access table as it's read by a third party application. – C82 Feb 09 '18 at 16:07
  • Then what is your question? As long as the source data has a unique key, there are a hundred articles and posts about how to de-dupe incoming data in SSIS. – Tab Alleman Feb 09 '18 at 19:00
  • 1
    Possible duplicate of [SSIS 2012 - Insert new rows, ignore existing rows](https://stackoverflow.com/questions/19011049/ssis-2012-insert-new-rows-ignore-existing-rows) – Tab Alleman Feb 09 '18 at 19:03

1 Answers1

0

First of all, you're approach to inserting the data into the SQL table seems very expensive. Is there a reason you can't simply insert the data in to the SQL table in the data flow using a DataFlow destination, it will be much, much more performant.

If the reason you've not done this is because of the timestamp, you can achieve this be using a derived column transformation and GetDate().

Once you've done that you can implement the pattern used in the answer Tab linked to.

Andy
  • 25
  • 3
  • I used recordset so that I can load one row at a time (not in batch) to the SQL table and get the corresponding timestamp for each row loaded at the SQL table. – C82 Feb 11 '18 at 11:39