I have Postgresql db in RDS. I need to fetch data from a bunch of tables in postgresql db and push data into a S3 bucket every hour. I only want the delta changes (any new inserts / updates) to be sent in the hourly. Is it possible to do this using DMS or is EMR a better tool for performing this activity?
Asked
Active
Viewed 4,562 times
0
-
frankly saying I don't see any obvious way how you are going to do it with DMS or EMR... Please explain – Vao Tsun Apr 10 '18 at 07:13
-
@VaoTsun what would be the alternate way of doing this ? I thought of writing a spark job that would poll oracle dB at pre-determined intervals , create a file and push it to S3 – Punter Vicky Apr 10 '18 at 09:16
1 Answers
6
You can create an automated environment of migration data from RDS to S3 using AWS DMS (Data Migration Service) tasks.
- Create a source endpoint (reading your RDS database - Postgres, MySQL, Oracle, etc...);
- Create a target endpoint using S3 as an engine endpoint (read it: Using Amazon S3 as a Target for AWS Database Migration Service);
- Create a replication instance, responsible to make a bridge between source data and target endpoint (you will only pay while processing);
- Create a database migration task using the option 'Replication data change only' on migration type field;
- Create a cron lambda, which starts a DMS task, with stack Python following these instructions of this articles Lambda with scheduled events e Start DMS tasks with boto3 in Python.
Connecting these resources above you may can have what you want.
Regards,
Renan S.

rfschroeder
- 380
- 2
- 10