3

We have a couple of mySql tables in RDS that are huge (over 700 GB), that we'd like to migrate to a DynamoDB table. Can you suggest a strategy, or a direction to do this in a clean, parallelized way? Perhaps using EMR or the AWS Data Pipeline.

3 Answers3

4

You can use AWS Pipeline. There are two basic templates, one for moving RDS tables to S3 and the second for importing data from S3 to DynamoDB. You can create your own pipeline using both templates.

Regards

AGL
  • 536
  • 2
  • 7
  • Thank you, it would be perfect if we could use these templates, but we have **2 mySQL tables** that we'd like to store as **1 single DynamoDB table**. The templates have options for working with a single table, but not 2. Is there a work around for this, that doesn't involve preprocessing? (that would be our last resort - to join the 2 mySQL tables into one mySQL table, because it would need a lot of time and space) – Ankit Kapur Mar 26 '16 at 01:13
  • 1
    Hi Ankit. In this case you will need to include a EMR cluster in the Pipeline. The workflow should be: moving both tables to S3 in separated csv, a EMR cluster will merge/join tables and the output goes to S3, finally import your data to DynamoDB. Here you will need to develop a bit for the merge/join job. Import/export to S3 in EMR will be easy with Hadoop comands. – AGL Mar 26 '16 at 22:51
4

one thing to consider with such large data is whether Dynamo is the best option.

If this is statistical data or otherwise "big data", check out AWS RedShift which might be better suited for your situation.

Dmitry Buslaev
  • 290
  • 3
  • 7
  • We need to switch to a denormalized, schemaless table to accommodate certain use cases, so Dynamo is the better option for us. – Ankit Kapur Mar 26 '16 at 01:18
  • 1
    @AnkitKapur - Dynamo is only good if you can pick a good hash-key for your data! Do read the documentation and make sure you understand the limitations and best practices of Dynamo before you go that route. – Mike Dinescu Mar 27 '16 at 20:18
0

We have done a similar work and there is probably a better strategy to do this. Using AWS DMS and some prep tables within your source instance.

It involved two steps:

  1. You create new tables within your source instance which match exactly with the dynamodb schema. Like merging multiple tables to one etc.

  2. Set up DMS task with the prep tables as source and DynamoDB as the target. Since the prep tables and the target schema matches now, it should be pretty straightforward from this point.

starball
  • 20,030
  • 7
  • 43
  • 238
Hgottipati
  • 21
  • 4