2

I have exported and transformed 340 million rows from DynamoDB into S3. I am now trying to import them back into DynamoDB using the Data Pipeline.

I have my table write provisioning set to 5600 capacity units and I can't seem to get the pipeline to use more than 1000-1200 of them (really difficult to say the true number because of the granularity of the metric graph.

I have tried to increase the number of the slave nodes as well as the size of the instance for each slave node, but nothing seems to make a difference.

Does anyone have any thoughts?

Garet Jax
  • 1,091
  • 3
  • 17
  • 37
  • What value are you using for myDDBReadThroughputRatio ('DynamoDB write throughput ratio') in your pipeline? It should be '1' if you want to use all of your capacity. I think it defaults to 0.25, so sounds like you might still be using that? – F_SO_K Mar 08 '19 at 11:15
  • Thanks for the thought. The default is 0.25 (I actually changed the default to be 1) and the throughput ratio is set to 1. – Garet Jax Mar 08 '19 at 11:38
  • @GaretJax it turns out this question is super useful for DynamoDB in general, nothing to do with Data Pipeline. Would you mind changing the title to something like "DynamoDB data loading is too slow; not respecting Provisioned Write Capacity". This was the only place on the internet where we found this solution (after a lot of looking). – Francis Upton IV Jul 04 '20 at 18:35
  • @Francis Upton IV - Done. Thanks for the heads up. – Garet Jax Jul 05 '20 at 19:21

1 Answers1

1

The problem was there was a secondary index on the table. Regardless of the write provisioning level that I chose and the number of machines in the EMR, I couldn't get more than 1000 or so. I had the level set to 7000 so 1000 is not acceptable.

As soon as I removed the secondary index, the write provisioning maxed out.

Garet Jax
  • 1,091
  • 3
  • 17
  • 37
  • What was the capacity settings on the secondary index? If it is set lower than the table, it can create back pressure and throttle the table as DynamoDB cannot keep the index up to date. You should always endeavor to have the provisioned capacity set the same to avoid this. – NoSQLKnowHow Mar 12 '19 at 22:49
  • Both were set to the same provisioning level. When I removed the index, it was like a tidal wave. The usage went from ~1000 up to the full 7000. – Garet Jax Mar 14 '19 at 01:20