0

I have 2 SQL Server databases - publisher and subscriber. Since the subscriber is being used for statistics I have to provide anonymization for some records.

My idea is to:

  • Load data into the replica with Snapshot Replication (that action creates schema and some stored procedures for insert/update/delete).

  • Modify created stored procedures responsible for insert/update (like sp_MSins_projecttsttable_name) and put there anonymization logic.

  • Use Transactional Replication for further synchronization.

With that approach every time a new record is inserted into the table, SQL Server inserts its anonymized version to the subscriber database - replication uses custom procedures.

However, initial load (Snapshot Replication) is an issue here since stored procedures are created in original form during that process. I know I can perform Snapshot Replication and then run some scripts with anonymization logic, however I would rather avoid that.

My question:

  • is it possible to apply additional logic to the Snapshot Replication which modifies records on-the-fly?

Or maybe you know another solution for that and I should try a different approach? I am not sure if code is necessary:

Example article:

EXEC sp_addarticle @publication = N'ProjectPublication',
                   @article = N'table_name',
                   @source_owner = N'projecttst',
                   @source_object = N'table_name',
                   @type = N'logbased',
                   @description = NULL,
                   @creation_script = NULL,
                   @pre_creation_cmd = N'drop',
                   @schema_option = 0x000000000803509F,
                   @identityrangemanagementoption = N'manual',
                   @destination_table = N'table_name',
                   @destination_owner = N'projecttst',
                   @vertical_partition = N'false',
                   @ins_cmd = N'CALL sp_MSins_projecttsttable_name',
                   @del_cmd = N'CALL sp_MSdel_projecttsttable_name',
                   @upd_cmd = N'SCALL sp_MSupd_projecttsttable_name';

Original and modified procedure:

ALTER PROCEDURE [dbo].[sp_MSins_projecttsttable_name]
    @c1 numeric(19,0),
    @c2 nvarchar(255)
AS
BEGIN
    INSERT INTO [projecttst].[table_name] ([col1], [col2]) 
    VALUES (@c1, @c2) 
END

ALTER PROCEDURE [dbo].[sp_MSins_projecttsttable_name]
    @c1 numeric(19,0),
    @c2 nvarchar(255)
AS
BEGIN
    INSERT INTO [projecttst].[table_name] ([col1], [col2]) 
    VALUES (@c1, 'ANONYMIZED') 
END

Thanks in advance.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Maciej Skrzypiński
  • 557
  • 1
  • 4
  • 10
  • 2
    why not create a view over the tables(s) to perform anonymization and lock down tables with permissions? – Mitch Wheat Sep 28 '21 at 14:40
  • Some of the report queries might be heavy - I have to avoid unnecessary load on operational DB. – Maciej Skrzypiński Sep 28 '21 at 14:54
  • 3
    Side note: you should **not** use the `sp_` prefix for your stored procedures. Microsoft has [reserved that prefix for its own use (see *Naming Stored Procedures*)](http://msdn.microsoft.com/en-us/library/ms190669%28v=sql.105%29.aspx), and you do run the risk of a name clash sometime in the future. [It's also bad for your stored procedure performance](http://www.sqlperformance.com/2012/10/t-sql-queries/sp_prefix). It's best to just simply avoid `sp_` and use something else as a prefix - or no prefix at all! – marc_s Sep 28 '21 at 15:30

0 Answers0