1

I have been using AWS Data Pipeline to migrate data from DynamoDB to S3. The size of data is around 20 GB. Any thoughts on this ?

Amit
  • 30,756
  • 6
  • 57
  • 88
phaigeim
  • 729
  • 13
  • 34

1 Answers1

3

AWS DataPipeline exports entire DynamoDB tables to one file in S3. This particular Data Pipeline template will use a percentage of your table's provisioned capacity as defined by the MyExportJob.myDynamoDBReadThroughputRatio variable, and will scale the MapReduce job cluster appropriately. You can set the read throughput ratio from 0 to 1 (0%-100%).

If you have 20GB of data, and Data Pipeline scans your table in parallel with MapReduce, you would consume 5242880 RCU. It is up to you how long you want the backup to take. If you set the read throughput ratio to 1 and have RPS set to 11988 RPS, scanning the DynamoDB table should take around 5242880 / 11988 = 437 seconds (4 minutes and 17 seconds). The Data Pipeline job runtime should be proportional and very close to the time needed to scan the table. Remember, Data Pipeline has to start up a cluster and write the backup to S3.

Alexander Patrikalakis
  • 5,054
  • 1
  • 30
  • 48