SQL Replication - Many Publishers to One Subscriber - Clustered Key

Question

We have a number of SQL instances which have the same structure. Transactional replication has been set up to publish these into a single Subscriber. (A data warehouse scenario.)

We are using MS SQL 2012 R2 with the standard transactional replication setup

Each publisher instance table has a identifier column which is not part of that tables primary or clustered key. On the Subscriber we have added the identifier column to the primary or clustered key. We now have issues on deletion where the rows submitted cannot be found in the Subscriber as they have already been removed by the first publishers deletion. We are missing that identifier column at source.

As the publisher instances are supplied by the ERP vendor, I don't want to modify these tables to include the identifier column in the clustered keys.

How can I add the additional identifier column to the clustered key through the replication process?

Please tag your question with the DBMS you are using. How have you set up the publisher/subscriber functionality - is it part of the core functionality of your DBMS, are you using a 3rd party product or have you built your own solution? — NickW, Mar 10 '23 at 18:26
We are using the standard SQL transactional replication process — ColinA, Mar 13 '23 at 09:08
Please clarify via edits, not comments. Please ask 1 (specific researched non-duplicate) question. How are you stuck doing this & what are you able to do? [ask] [Help] [mre] — philipxy, Mar 13 '23 at 09:36

score 0 · Answer 1 · answered Mar 13 '23 at 11:18

If I've understood your scenario correctly, I would build the process as follows:

For each source system, I would add a source system identifier to each dataset as it lands in the ingestion layer of your subscriber
Initially process data from each source system in isolation. So if a record is deleted in a source system, then flag that source system's record in your subscriber as deleted
Where the same data can be sourced from multiple systems (which I think is the scenario you are describing) determine what the source system precedence logic is. For example, if you have customer data in multiple systems
- Do changes from one system take precedence over changes in any other system (i.e. it is the "golden source")
- All source systems are equal and the latest change, regardless of source takes precedence
Build your next data layer (after the ingestion layer) using this logic
- So if a customer record has been deleted in Source A and this is reflected in your Subscriber, the fact that this record doesn't exist when you come to process a record from Source B wouldn't matter (and you handle this "gracefully") because you've defined Source A as the golden source for customers

BTW I probably wouldn't be deleting records in your subscriber, instead I would soft-delete them and then you wouldn't have these types of issues.

score 0 · Answer 2 · answered Mar 14 '23 at 15:40

Update

I have been unable to find a way to modify the primary key of the source tables BEFORE the data reaches the Publisher.

Instead I have configured a two stage replication process as follows.

Setup new separate subscriber databases for each publication.
Once replicated. Modify the keys on the new subscriber tables to include the server location id column.
Setup another publisher using the modified tables and add a single subscriber to these.

This works and the modified clustered keys are picked up by the second publisher.

A bit of a pain but it's working

SQL Replication - Many Publishers to One Subscriber - Clustered Key

2 Answers2