Basic SSIS hierarchy move

Question

I have a relational database in which I'd like to copy a subset of hierarchical data from one database to another. I've found lots of information about moving or copying single tables with all sorts of transformations, but not a lot on basic hierarchical moving.

I have three questions about doing this in SSIS.

Will uniqueidentifier IDs make this job easier than auto incremented integer IDs?
How do I copy a basic Parent/Child in SSIS?
How do I copy a a more complex Parent/Child/Parent2?

Question 1: Uniqueidentifier IDs

One solution to a question about moving data described creating clever staging tables to create new IDs. This led me to think that SQL UniqueID has an advantage over integer auto increment ID, since one can simply set "Keep Identity" without collision concerns. Is this true?

Question 2: Moving a basic hierarchy

Here's a simplified schema of my database (Parent<-Child):

OptimizationSets<-AllocationLimits

Assets<-AllocationLimits

Assets<-Returns

I'd like to choose some rows from the OptimizationSets table to copy from one database to another. In addition, I'd like to copy the corresponding child rows in AllocationLimits. I can use the OLE DB source to select the desired parent rows easily, but selecting the child rows doesn't seem obvious. The lookup function only returns one child row rather than all of them.

Question 3: Complex relationships

In addition to the selected rows in AllocationLimits, I'd like to copy the related parent rows in Assets, as well as the related returns.

Do you need to be able to audit back to your original database records (preserve the original ID?) Will this be happening a lot? Do you envisage merging a number of these databases together? The more Yes's, the more reason to use a uniqueidentifer. With regards to moving the data, if you search on SSIS you'll find all kinds of wonderfully convoluted ways to do it, but you should be able to simply write three select statements to extract the data you want then load that into the target database. If the database is on the same server you don't even need SSIS. — Nick.Mc, Aug 20 '15 at 00:28
There is no auditing, but this will occur regularly. It will only be 2 DBs. I'm not sure how to easily and properly handle the primary and foreign keys so that relationships are preserved unless I preserve the original keys. — jlear, Aug 20 '15 at 01:15
Also, I'm not sure I understand how to create the three SQL select statements so that only the related data is selected. For example, the top of the hierarchy OptimizationSets is easy, because I will know the primary key I want. I can see that I could select the AllocationLimit rows using that key as well, but then I'll have a bunch of AllocationLimit rows for each of which I want the parent Asset rows (and then the children of the Asset rows, and so on). Can this be done without MergeJoin or Lookup blocks? — jlear, Aug 20 '15 at 01:23
BTW, 1) it is two different servers, and 2) thanks for the reply. — jlear, Aug 20 '15 at 01:26
Is there only one level (i.e. the tables don't recursively reference themselves) Unless I'm missing something the SQL to do this is pretty straightforward. I'll happily post as an answer (and make some assumptions about field names) — Nick.Mc, Aug 20 '15 at 03:09
There are two basic ways to avoid the key collisions: 1. Add another column and make a composite unique key (usually not practical because foreign key constraints can't use a composite key) 2. Go through a key remapping exercise (for example use a 'surrogate key'). This means loading all the data into staging tables, remapping them in the staging table and loading into the target. Or you could achieve the same thing in SSIS but IMHO it would be very complicated. You should pick the method (SSIS or SQL) that you can maintain in future. — Nick.Mc, Aug 20 '15 at 03:31
GUIDs seem a lot more attractive now. Thanks, @Nick.McDermaid. — jlear, Aug 20 '15 at 16:34

score 0 · Accepted Answer · answered Aug 20 '15 at 00:04

It's not a complete answer, but the hierarchical moves can be accomplished using Merge Join and lookup blocks. Some of the data streams need to be sorted, which can be accomplished with SQL statements in the blocks or sort blocks.

Here's a bit of the flow (sorry, can't post pictures since I'm a newbie):

OptimizationSets source -> Multicast 1
Multicast 1 -> OptimizationsSets destination
Multicast 1 -> MergeJoin 1
AllocationLimits source -> MergeJoin 1
MergeJoin 1 -> Multicast 2
Multicast 2 -> AllocationLimits destination
Multicast 2 -> sort -> Lookup Assets
Lookup Assets -> Multicast 3
Multicast 3 -> Assets destination
Multicast 3 -> MergeJoin 3
Returns source -> MergeJoin 3
MergeJoin 3 -> Returns destination

There are some details left out, such as some SQL WHERE and ORDER BY clauses on the sources, but the general flow is pretty straightforward. Getting children and parents can be accomplished with the MergeJoin and Lookup blocks.

Basic SSIS hierarchy move

Question 1: Uniqueidentifier IDs

Question 2: Moving a basic hierarchy

Question 3: Complex relationships

1 Answers1