0

I have Postgresql db in RDS. I need to fetch data from a bunch of tables in postgresql db and push data into a S3 bucket every hour. I only want the delta changes (any new inserts / updates) to be sent in the hourly. Is it possible to do this using DMS or is EMR a better tool for performing this activity?

Punter Vicky
  • 15,954
  • 56
  • 188
  • 315
  • frankly saying I don't see any obvious way how you are going to do it with DMS or EMR... Please explain – Vao Tsun Apr 10 '18 at 07:13
  • @VaoTsun what would be the alternate way of doing this ? I thought of writing a spark job that would poll oracle dB at pre-determined intervals , create a file and push it to S3 – Punter Vicky Apr 10 '18 at 09:16

1 Answers1

6

You can create an automated environment of migration data from RDS to S3 using AWS DMS (Data Migration Service) tasks.

  1. Create a source endpoint (reading your RDS database - Postgres, MySQL, Oracle, etc...);
  2. Create a target endpoint using S3 as an engine endpoint (read it: Using Amazon S3 as a Target for AWS Database Migration Service);
  3. Create a replication instance, responsible to make a bridge between source data and target endpoint (you will only pay while processing);
  4. Create a database migration task using the option 'Replication data change only' on migration type field;
  5. Create a cron lambda, which starts a DMS task, with stack Python following these instructions of this articles Lambda with scheduled events e Start DMS tasks with boto3 in Python.

Connecting these resources above you may can have what you want.

Regards,

Renan S.

rfschroeder
  • 380
  • 2
  • 10