DynamoDB on-demand scaling and throttling

Question

Im currently researching on-demand settings for DynamoDB. We occasionally see some throttling during high traffic spikes, its across all indexes so not specific to a partition. All DynamoDB docs in relation to on-demand capacity mode seem to suggest that once a new peak usage is set, its basically the peak indefinitely. Am I correct in assuming this, or does it reset/scale down after a period of time? Is there anywhere the internals of dynamo scaling for on-demand capacity mode are documented?

I want to prewarm my low volume tables by provisioning an arbitrarily high capacity and switching back to on-demand capacity mode, just trying to conclusively find that this strategy will hold up over time?

score 7 · Accepted Answer · answered Nov 11 '21 at 11:03

The docs confirm your understanding of how DynamoDB scales out in the On-Demand capacity mode (emphasis mine):

Peak Traffic and Scaling Properties

[...] On-demand capacity mode instantly accommodates up to double the previous peak traffic on a table. For example, if your application’s traffic pattern varies between 25,000 and 50,000 strongly consistent reads per second where 50,000 reads per second is the previous traffic peak, on-demand capacity mode instantly accommodates sustained traffic of up to 100,000 reads per second. If your application sustains traffic of 100,000 reads per second, that peak becomes your new previous peak, enabling subsequent traffic to reach up to 200,000 reads per second.

If you need more than double your previous peak on table, DynamoDB automatically allocates more capacity as your traffic volume increases to help ensure that your workload does not experience throttling. However, throttling can occur if you exceed double your previous peak within 30 minutes. For example, if your application’s traffic pattern varies between 25,000 and 50,000 strongly consistent reads per second where 50,000 reads per second is the previously reached traffic peak, DynamoDB recommends spacing your traffic growth over at least 30 minutes before driving more than 100,000 reads per second.

Concerning the strategy of setting the initial peak value for new tables by first deploying them in the Provisioned Capacity mode and large RCU/WCU values and then switching it over to On-Demand - that also works. It automatically allows for the same throughput by setting the starting value for the previous peak to half the RCUs/WCUs and since double that number is always supported you retain your capacity.

The docs don't explicitly state that it stays like this indefinitely, but they also don't talk about scaling down. In practice I also haven't seen that happen. In my experience AWS wouldn't leave something like this out of the docs.

It's also unlikely based on the architecture of DynamoDB, which AWS explains in this really cool tech talk at re:invent 2018. DynamoDB scales in partitions and the number of partitions for a table can only increase. Each storage partition is capable of:

Serving up to 3000 RCUs
Serving up to 1000 WCUs
Storing 10GBs of data

As soon as any of those limits is reached, a partition split happens - two new partitions are created and the data is distributed among them. This happens as many times as necessary until the newly configured parameters (RCU, WCU, storage) can be accommodated.

It's not stated explicitly, but since you can pretty much instantly change from on-demand to provisioned capacity and vice versa, it's fair to assume that the underlying architecture is the same or at least very similar with a different billing model on top of it.

Since the number of partitions can always only go up, it's unlikely that the peak capacity will go down.

That being said: it's not part of the published API and considered an implementation detail, so there is no guarantee or promise that it will always stay like this.

Thanks, that was a good talk! One thing I cant find any information on is; if DynamoDB is scales up by adding partitions, and number of them can only increase, does that imply that autoscaling in provisioned capacity mode is using a rate limiting mechanism to scale down during "off-peak" times? Why wouldn't they just allow the same capability to hit 2x peak instantly as on-demand does, would this simply just be a billing decision by AWS? It seems to me that provisioned capacity is essentially a v1 mode that shouldn't ever be used really unless usage is flat or otherwise very predictable? — Person1, Nov 15 '21 at 00:58
Thinking some more has made me realise that on-demand likely allows the double peak throughput because it is playing with the token mechanism used to determine burst capacity. Because the table ads partions but doesn't take any away, this sets the absolute maximum throughput. The on-demand 30min burst capacity gives you that sustained access to a maximum until the table needs to scale to add more partitions. In this sense scaling down is not relevant as we pay per request rather than hourly throughput, and the increased burst capacity is being accounted for by the billing mechanism. — Person1, Nov 15 '21 at 05:35

DynamoDB on-demand scaling and throttling

1 Answers1

Peak Traffic and Scaling Properties