My problem is here to create a replication pipeline that replicates tables and data from MySql RDS to Redshift and I cannot use any managed service. Also, any new updates in RDS should be replicated in the redshift tables as well.
After looking at my many solutions, I came to an understanding of the following steps:
- Create flat files/CSVs from MySql RDS and save them in S3.
- Use Redshift's COPY command to copy data in staging tables and then finally save it to the main tables.
- Now, for the update part, every time I will push the CSVs to S3 and step 2 will be repeated.
So, I just wanted to confirm if the above approach is fine? As, every time when an update happens, will the old data be deleted completely and replaced by the new or is it possible to just update the necessary records. If yes, then how?
Any help will be really appreciated. Thanks in advance.