3

Requirement is to Move Data Older than 3 years from Production DB to Archive DB , and Once Moved Delete those records from Production DB , so at any point of time Production DB will have only last three years of records.

i want to achieve this by SSIS , i read quite a few articles about Data Archival but couldn't figure out the best Approch.

I am New to SSIS

i want to achieve exactly something like this (answer given in Below link)with extra condition saying move only those records which are older than 3 years and then delete those records.

https://dba.stackexchange.com/questions/25867/moving-data-from-one-db-to-another-using-ssis

Criteria for an accepted answer answer should address

  • scalability
  • complexity
  • failure handling
  • reliability
Community
  • 1
  • 1
Ujju
  • 43
  • 1
  • 5

3 Answers3

1

You can use the OUTPUT clause to delete and return the data to be moved in one go.

create table ProductionTable
(
    ValueDate   datetime        not null
    , Data      varchar(max)    not null
)

insert ProductionTable values ('20100101', '3 years ago')
insert ProductionTable values ('20130425', 'this year')
insert ProductionTable values ('20130426', 'this year')

delete ProductionTable
output deleted.ValueDate, deleted.Data
where ValueDate <= dateadd(year, -3, getdate())

The code can also be accessed on SQLFiddle

Now I will show you the exact steps you need to follow in SSIS to reproduce the example:

  1. Create a new project and define your data sources for ProductionDB and ArchiveDB.
  2. In "Control Flow" tab, create a "Data Flow Task".
  3. In "Data Flow" tab, create a "OLE DB Source" and a "OLE DB Destination".
  4. In "OLE DB Source", select ProductionDB and choose "SQL command" as the data access mode. Paste in the delete statement with the output clause.
  5. Click on "Columns" and then OK.
  6. In "OLE DB Destination", select ArchiveDB and choose "Table or view - fast load" as the data access mode and then choose your ArchiveTable.
  7. Click on "Mappings" and then Ok.
  8. Run the package and you should be able to verify that one row is deleted from ProductionTable and moved to ArchiveTable.

Hope it helps.

Other things to keep in mind, because you are deleting and moving data around, transactional consistency is very important. Imagine half way through your delete/move, the server went down, you then end up with data being deleted but not made it to the archive.

If you are unsure about how to protect your data by enforcing transactional consistency, please seek help from other SQL/SSIS experts on how to use transactions in SSIS.

Louie Bao
  • 1,632
  • 2
  • 16
  • 23
  • hi @louie thanks for the wonderful response , it helps...Now i have a Bigger things to worry about , i am gonna ask you a very silly question...given a same task to you (Data Archival) What you would've done? should you go for the same 'SSIS' solution or you will write a 'store procedure' which will do the same thing for you. what i am asking is the advantage or disadvantages of 'SSIS' solution of 'data archival' over the Classic 'Store Procedure'.!! Please could you help me figure it out..!! criteria i am looking for is 'Scalability' 'Complexity' 'Failure Handling' – Ujju Apr 29 '13 at 09:36
  • 1
    I need to clarify one thing up front: SSIS and stored procedures can co-exist, actually they complement each other. Thinking of SSIS as a workflow engine and stored procedures as units of work controlled by the engine. If a stored procedure is a Walther PPK (pistol), then SSIS is 007. Continuing with the James Bond analogy, the beauty of SSIS (007) is that it was well trained to master many more weapons, such as emails, ftp, file system operations to name a few. Hope my explanation helped with your understanding. – Louie Bao Apr 29 '13 at 11:38
  • The task at hand is something that can be achieved within SQL itself, so there is little reason to utilise SSIS. I am assuming that ProductionDB and ArchiveDB are located on the same SQL Server. I would create a stored procedure to capture as much activities as possible. Encapsulating everything in a stored procedure is a very powerful concept as it keeps your code modular hence enhancing scalability. Since the DELETE statement with the OUTPUT clause is one ATOMIC operation, it is also reliable. Complexity wise, you are only dealing with one object, can't get any simpler. – Louie Bao Apr 29 '13 at 11:42
  • great analogy..!! All doubts cleared...loved the 'James Bond' Analogy..!! great stuff..!! – Ujju Apr 29 '13 at 13:30
1

Create 2 OLE DB Connection Managers. Name them Production and Archive and have them point to the correct servers and database. These CMs are what SSIS uses to push and pull data from the databases.

Add a Data Flow Task. A DFT is the executable that will allow row by row manipulation of the data. Double click on the Data Flow Task. Once inside, add an OLE DB Source and and OLE DB Destination to the canvas. The OLE DB Source is where the data will come from while the OLE DB Destination provides the insert power.

The logic you would want to implement is a Delete first approach, much as I outlined in the other answer.

DELETE
    DF
OUTPUT
    DELETED.*
FROM
    dbo.DeleteFirst AS DF
WHERE
    DF.RecordDate > dateadd(y, 3, current_timestamp);

This query will delete all the rows older than 3 years and push them into the dataflow. In your OLE DB Source, make the following configuration changes

  1. change the Connection Manager from Archive to Production
  2. change the query type from "Table or View" to "Query"
  3. paste your query and click the Columns tab to double check the query parsed

Connect the OLE DB Source to the OLE DB Destination. Double click on the OLE DB Destination and configure it

  1. Verify the Connection Manager is the Archive
  2. Ensure the Access Mode is "Table or View - Fastload" (name approximate)
  3. You might need to check the Retain IDs based on your table design - if you have identity column, then do check it if you want ID 10 from the production system to be ID 10 in the Archive system
  4. Select the actual table
  5. On the Mapping tab, ensure that all the columns mapped. It does this automatically by matching names so there shouldn't be a problem.

If you do not need to span an instance, the above logic can be condensed into a single Execute SQL Task

DELETE
    DF
OUTPUT 
    DELETED.*
    INTO
    ArchiveDatabase.dbo.DeleteFirst
FROM
    dbo.DeleteFirst AS DF
WHERE
    DF.RecordDate > dateadd(y, 3, current_timestamp);

Also note with this approach that if you have identity columns you will need to provide an explicit column list and turn on and off the IDENTITY_INSERT property.

Community
  • 1
  • 1
billinkc
  • 59,250
  • 9
  • 102
  • 159
  • thanks @billinkc for your time and reply it helps , i have few doubts about different '-lities' (like Reliability , Scalability , Performance , Failure handling) , How these factors affect if i would have implemented same solution by writing a separate script (let say a Store Procedure 'SP_DBBackup') instead of building a 'SSIS' Packages , Which Solution would give me better performance? and why? – Ujju Apr 29 '13 at 09:21
0

Have you considered table partitioning instead? You can move the old records to a totally different disk and still leave them available in the same table. It can also help with performance in some cases... all without an SSIS package.

Nick.Mc
  • 18,304
  • 6
  • 61
  • 91
  • hi @electricllama Thanks for your suggestion , i did consider 'table partitioning' instead but i somehow afraid by partitioning (i don't know :( ) , Partitioning gives me bad feeling , May be i am at the initial stages of coding and i don't wanna screw myself now..!! anyways thanks for your kind suggestion..!! – Ujju Apr 29 '13 at 09:28
  • That's fine - i it's out of your comfort zone and you think you might have issues maintaining it then it's better to go with what you know you can support. – Nick.Mc Apr 29 '13 at 09:36