Redshift to redshift ongoing sync using AWS data pipeline

Question

We would like to explore the possibility of replicating 'some of the tables' our data warehouse on AWS Redshift into another Redshift cluster in an ongoing manner. Don't ask me why as the reasons involve different teams wanting to do do things differently in our company and wanting access to "their data" using their preferred analytic tools.

Anyways, the requirement is as follows:

Source DB - Redshift
Target DB - Redshift
Only some tables - around 20% of the entire DB to be replicated.
Source DB tables are refreshed every night so the replication needs to happen after that.
It would be acceptable to the business to have a delay of maximum 24 hours after the source DB has been updated. For example if the source Redshift is updated with latest data on 01 Jan at 05:00 AM, it is expected that the target Redshift has those updates by the next day - 02 Jan 05:00 AM.
Volume of daily data updates (incremental adds/changes) - around 15-20 GB daily.
Constraints on cost: No major constraints around this. Assume that whatever $$ would be needed would be approved.

Wondering if it is possible to use AWS data pipeline to set up such a replication. Or any better ideas on how such a replication can be achieved. Thanks.

Redshift support sharing tables across clusters. Also data pipeline is not suited for ongoing replication. For that its better to use Database Migration Service. — Marcin, Sep 13 '22 at 03:27
@Marcin Thanks, can you elaborate a bit more about data sharing on clusters. If its just the same table being shared, I can imagine the owner of the source table having some concerns on the same table being shared across another cluster and any potential performance issues due to that. That's why we were considering just doing a daily replication instead (during non-work hours) and not let the users touch the source table. I will read up about AWS DMS for this use case. Thanks for your advise. — rk0905, Sep 13 '22 at 03:39
@Marcin - Sorry just checked that AWS DMS does not support Redshift as the source DB. So, I'll probably need to look at the option to share tables across clusters. https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Sources.html#CHAP_Introduction.Sources.title — rk0905, Sep 13 '22 at 03:49
Oh. Its interesting that DMS does not support RS. Will have to remember that. — Marcin, Sep 13 '22 at 03:50
Sounds like a perfect use case for Data Sharing (https://aws.amazon.com/about-aws/whats-new/2021/03/announcing-general-availability-of-amazon-redshift-data-sharing/) — MP24, Sep 13 '22 at 11:52
@MP24 thanks for sharing this, let me take a look at this and see if it meets our needs. — rk0905, Sep 15 '22 at 07:48
@MP24 - Read up about datashares and the team will try to do a POC on it to see if it can meet our needs. Theoretically, yes it seems to meet the requirements. — rk0905, Sep 20 '22 at 04:20

Redshift to redshift ongoing sync using AWS data pipeline

0 Answers0