Copying Large Amounts of Data to Replicated Database

Question

I have a local SQL Server database that I copy large amounts of data from and into a remote SQL Server database. Local version is 2008 and remote version is 2012.

The remote DB has transactional replication set-up to one local DB and another remote DB. This all works perfectly.

I have created an SSIS package that empties the destination tables (the remote DB) and then uses a Data Flow object to add the data from the source. For flexibility, I have each table in it's own Sequence Container (this allows me to run one or many tables at a time). The data flow settings are set to Keep Identity.

Currently, prior to running the SSIS package, I drop the replication settings and then run the package. Once the package completes, I then re-create the replication settings and reinitialise the subscribers.

I do it this way (deleting the replication and then re-creating) for fear of overloading the server with replication commands. Although most tables are between 10s and 1000s of rows, a couple of them are in excess of 35 million.

Is there a recommended way of emptying and re-loading the data of a large replicated database?

I don't want to replicate my local DB to the remote DB as that would not always be appropriate and doing a back and restore of the local DB would also not work due to the nature of the more complex permissions, etc. on the remote DB.

It's not the end of the world to drop and re-create the replication settings each time as I have it all scripted. I'm just sure that there must be a recommended way of managing this...

If you're running Enterprise Edition, check out CDC (Change Data Capture). It basically just logs data changes, thus reducing the amount of data you'd need to load. — SchmitzIT, Dec 11 '14 at 11:31
as SchmitzIT says, CDC could help you (we are using it for the same purpose). Alternatively you could implement some other ways to only add/update new data, not overwriting whole table. You could create DML trigger and capture all changes to the tables data and then processing it in your SSIS package for 3 ways: deletes/updates/inserts. — Timofey G Morozov, Dec 11 '14 at 11:37
Like @TomTom suggested, then use delta loads. It's fairly easy to create a hash of all the values of a row and compare that at the target. Better yet, if you have a LastModified date, that can be used to only move data changed later than last update across. — SchmitzIT, Dec 11 '14 at 11:37

score 1 · Answer 1 · answered Dec 11 '14 at 11:30

1

Not doing it. Empty / Reload is bad. Try to update the table via merge - this way you can avoid the drop and recreate, which also will result in 2 replicated operations. Load the new data into temp tables on the other server (not replicated), then merge them into the replicated tables. If a lot of data is unchanged, this will seriously reduce the replication load.

answered Dec 11 '14 at 11:30

TomTom

61,059
10
88
148

It does feel a bit naughty doing the empty/reload! I presume by merge you mean: http://msdn.microsoft.com/en-us/library/bb510625.aspx I had not thought of that and I like it :) – TechyGypo Dec 11 '14 at 11:44
Yes, I mean exactly that - the merge statement. Dropping data just to recreate it - and have this recreation then replicate out - sounds quite sub optimal. You also have data downtime (time with no data), which you can mostly avoid with merge. – TomTom Dec 11 '14 at 11:48

Copying Large Amounts of Data to Replicated Database

1 Answers1