We would like to explore the possibility of replicating 'some of the tables' our data warehouse on AWS Redshift into another Redshift cluster in an ongoing manner. Don't ask me why as the reasons involve different teams wanting to do do things differently in our company and wanting access to "their data" using their preferred analytic tools.
Anyways, the requirement is as follows:
- Source DB - Redshift
- Target DB - Redshift
- Only some tables - around 20% of the entire DB to be replicated.
- Source DB tables are refreshed every night so the replication needs to happen after that.
- It would be acceptable to the business to have a delay of maximum 24 hours after the source DB has been updated. For example if the source Redshift is updated with latest data on 01 Jan at 05:00 AM, it is expected that the target Redshift has those updates by the next day - 02 Jan 05:00 AM.
- Volume of daily data updates (incremental adds/changes) - around 15-20 GB daily.
- Constraints on cost: No major constraints around this. Assume that whatever $$ would be needed would be approved.
Wondering if it is possible to use AWS data pipeline to set up such a replication. Or any better ideas on how such a replication can be achieved. Thanks.