0

Our company used to have onpremiere and cloud ERP systems and they finally wanted to pull data from each ERP systems to staging database in our datawarehouse. These two systems are identical interm of table strunctures and schemas however,we need to consolidate a few tables to create a unique table. Such as, dbo.Product from ERP1 and dbp.Product from ERP2 systems have overlapping productId(s) what would be the ideal case to merge these two tables without breaking data integratity

Example: ERP1.Product( ProductId int , ProductName varchar(90), Product varchar(MAX) )

ERP2.Product( ProductId int , ProductName varchar(90), Product varchar(MAX) )

enter image description here

S3S
  • 24,809
  • 5
  • 26
  • 45

2 Answers2

0

UNION would work

select * from ERP1 
union 
select * from ERP2

UNION returns distinct values where UNION ALL returns all, potentially duplicated values. Naturally, each statement must return the same number of columns, and have matching data types.

If you want to create the table on the fly, you can use an INTO clause on the first statement

select * from Server1.dbo.ERP1
into thisServer.dbo.StagingTable
union 
select * from Server2.dbo.ERP2
S3S
  • 24,809
  • 5
  • 26
  • 45
  • Thanks @scsimon I guess I might need to add one more condition here. We should be relying on ERP1 and if product is there we should select the one from ERP1 and if it is not there use product coming from ERP2. I was also curious if I needed to track the footprints like which products coming from which table. – ahmet ciftcioglu Apr 17 '18 at 20:56
  • If they are completely identical, why choose the one from ERP1 over 2? – S3S Apr 17 '18 at 20:58
0

The big question here is, when the product ids overlap, does this fact need to be respected on the merge or not? i.e. are they genuinely the same product, or is the overlap accidental?

If the overlap /does/ need to be respected, then I would suggest you will need to decide which is the primary source in the case of differences in product name/description, and then load the data warehouse table in a way that ignores duplicates coming from the secondary source, e.g.:

... where erp2product.productid not in (select erp1product.productid from erp1product) 

If there are no 'true' overlaps, then I would be inclined to stick a prefix on the front of the product id (or add a 'source' column) to differentiate them in the data warehouse, although obviously any other table in the data warehouse that contained the product id would have to follow the same rule.

MandyShaw
  • 1,088
  • 3
  • 14
  • 22
  • Good question. Yes! we are allowing overlaps because ERP2 is introduced after ERP1 and targeting different region however products will be the same products. – ahmet ciftcioglu Apr 17 '18 at 21:00