Application:
I have a Java app that has spikes/bursts in its input, which represent events. Per event there is always a read, a decision made (logic), and then a possible write (insert or update) based on that decision. The majority of reads do not result in a write (insert/update) from a day-to-day basis. This is, in theory, a good scenario for DAX and DynamoDB, with On-Demand billing/config.
Scenario/Issue:
When a burst does occur that happens to be composed of a spike of writes, then we have sometimes (but not always) seen a spike of generic AmazonClientException instances. The exceptions have a retry
(i.e. isRetry()
) of false
and the root cause is an InternalServerException instance with a generic message value/String that includes the text that there was a 500 response (no additional details or clarity).
These DAX exceptions and their stack traces really give no insight into their true cause (e.g. throttling, a temporary provision throughput exception before resizing, host unavailable, etc.). This seems like unexpected response behavior from DAX for what seems to be DynamoDB On-Demand resizing to me (see additional info below for more hints), but I am still at a bit of loss/clarity on the cause of these when they do occur in response to this scenario.
Additional Info:
- I am using the latest instance available of the Java DAX client.
- Outside of bursts, reads/writes are continuously successful. This includes when creating a slow feed with the events/data, in a controlled test case, that were captured when previously creating exceptions in a burst.
- When using Provisioned (not On-Demand) we don't see these vague client exceptions. We do see throughput reached exceptions from DAX, as expected for the same burst/spike scenarios, and use a successful back-off retry strategy based on those exceptions
isRetry()
(i.e.true
) identifying that we can/should retry.