I have just started a project with some fairly daunting requirements. Company A uses an application that writes records to Company B. The task is to move/update/sync very large amounts of data (195 tables and 2500 gb of data, millions and millions of rows) from an AWS RDS sql instance (Company B) which I believe is sql 2017, to an on premise instance (Company A) which is 2016. The acceptable threshold for latency is <=5 mins. We only have read access to the source and cannot install anything there so the traditional means of replication are not available.
There is 1 central table we'll call table A, which has a primary key defined (TableAID). The rest of the tables have a foreign key relationship to Table A, and they also have their own primary keys defined with other relationships between them. The "gotcha" to all of this is that when there is an update to the source, all data is dropped and reinserted thereby creating all new primary keys, with the exception of TableAID. So TableAID is the only primary key that persists and can be counted on. The other tables will still maintain their relationships, but with different primary keys after an update. This makes updating the target with deltas very difficult with all of the one to many relationships. In addition to this, Company B will archive data from time to time and we at Company A will have to sync existing data while retaining the data that was archived and no longer part of the data stream.
We explored using SSIS for this but can't get anywhere close to the latency expectations. After some digging in other forum topics I ran across a recommendation to use MS StreamInsight. I am not familiar with this, but if it will work as a means of real-time replication, I can get up to speed. I am not tied to any particular technology, but having said that, my background is with the MS toolset. Before I invest tons of time with StreamInsight, I would like to get an idea if this is a viable solution to my problem. Any other recommendations are also welcome!
Thanks!