0

I have a challenge which involves merging two applications which reside entirely in separate SQL server databases. Each database maintains its own internal database ID's which are sequenced integers in the 5,000,000 range. I am using a tool which imports the data via standard API's.

The key challenge I'm facing has to do with referential integrity with parent and child relationships once the datasets are merged. Please refer to the table below for an example. Each application has identical data structures, but their data is unique. We would like all of the data to reside in one system.

One example is with the project object, which has multiple tasks, and each task can have multiple resources assigned. In the parenthesis is the internal database sequence ID (primary key) for each object type. Each project is unique, each task is unique per project, but the same resource can be assigned across projects and tasks.

System 0

Project 1 (PROJECT ID 5000001)                  
    Task A ( TASK ID 5000001)                   
        Resource X (RESOURCE ID 5000001)            
        Resource Y (RESOURCE ID 5000002)        
    Task B ( TASK ID 5000002)                   
        Resource Y (RESOURCE ID 5000002)            
        Resource Z (RESOURCE ID 5000003)            

Project 2 (PROJECT ID 5000002)                  
    Task A (TASK ID 5000003)                    
        Resource Z (RESOURCE ID 5000003)            
    Task B (TASK ID 5000003)                    
        Resource X (RESOURCE ID 5000001)            

System 1

Project 3 (PROJECT ID 5000001)  
    Task C ( TASK ID 5000001)   
        Resource F (RESOURCE ID 5000001)    
        Resource G (RESOURCE ID 5000002)    
    Task D ( TASK ID 5000002)   
        Resource G (RESOURCE ID 5000002)    
        Resource H (RESOURCE ID 5000003)    

Project 4 (PROJECT ID 5000002)  
    Task A (TASK ID 5000003)    
        Resource H (RESOURCE ID 5000003)    
    Task B (TASK ID 5000004)    
        Resource F (RESOURCE ID 5000001)

From the data above, you can see how merging Project 1 from System 0 would overwrite Project 3 in the target System 1 due to the pre-existing PROJECT ID 5000001.

My question is how to merge the data while maintaining referential integrity? My first thought is to create a view which combines both data sets from both systems for each object and somehow update the records that will be imported, and also store the new internal ID and cascade that new reference ID to the related objects. Is there a simpler approach?

Are there tools which can automate this?

slm
  • 15,396
  • 12
  • 109
  • 124

3 Answers3

1

I would suggest to use Redgate sql data compare tool. Its not free but worth it though. Here is its link

Vasanth
  • 1,670
  • 1
  • 13
  • 17
0

Since you only have 2, one option is to "go negative".

If you're using int or big int and your surrogate key data types.....AND you started all the seeding at (1,1) (or similar), you can negative'fy the surrogates of one database.

However, you might have client code that checks for ">0", and that could screw you.

Remember, the range is actually:

MySurrogateKey  int IDENTITY (-2147483648,1) 

(to positve 2147483648 )...it doesn't have to start at 1 (or 0).

granadaCoder
  • 26,328
  • 10
  • 113
  • 146
0

This is not going to be particularly easy, but you can carry out the whole operation in a single (large) stored procedure.

What I would do (personal approach, others might do it differently) is create an SQL Server instance with each of the existing databases and three more that are initially 'empty'. I would write a stored procedure that would read all of the tables from the first database and write them to the third database, adding 10,000,000 (ten million) to the ID fields. This your 5000001 would become 15000001. Subsequent steps in the stored procedure would update all foreign keys that pointed to these PKs to add 10,000,000 to each of them.

You would follow a similar process by adding 20,000,000 to the ID fields of the second project database, and you would write out the newly renumbered tables to database 4. Then you would update the foreign key fields as described for the earlier case. What you have at the completion of this phase is records that are distinct within the combined projects.

You would then insert the selects from the respective database 3 and database 4 tables into their database 5 target. This would combine the tables into a globally distinct singularity.

Meredith Poor
  • 351
  • 2
  • 9