0

Application:

I have a Java app that has spikes/bursts in its input, which represent events. Per event there is always a read, a decision made (logic), and then a possible write (insert or update) based on that decision. The majority of reads do not result in a write (insert/update) from a day-to-day basis. This is, in theory, a good scenario for DAX and DynamoDB, with On-Demand billing/config.

Scenario/Issue:

When a burst does occur that happens to be composed of a spike of writes, then we have sometimes (but not always) seen a spike of generic AmazonClientException instances. The exceptions have a retry (i.e. isRetry()) of false and the root cause is an InternalServerException instance with a generic message value/String that includes the text that there was a 500 response (no additional details or clarity).

These DAX exceptions and their stack traces really give no insight into their true cause (e.g. throttling, a temporary provision throughput exception before resizing, host unavailable, etc.). This seems like unexpected response behavior from DAX for what seems to be DynamoDB On-Demand resizing to me (see additional info below for more hints), but I am still at a bit of loss/clarity on the cause of these when they do occur in response to this scenario.

Additional Info:

  • I am using the latest instance available of the Java DAX client.
  • Outside of bursts, reads/writes are continuously successful. This includes when creating a slow feed with the events/data, in a controlled test case, that were captured when previously creating exceptions in a burst.
  • When using Provisioned (not On-Demand) we don't see these vague client exceptions. We do see throughput reached exceptions from DAX, as expected for the same burst/spike scenarios, and use a successful back-off retry strategy based on those exceptions isRetry() (i.e. true) identifying that we can/should retry.
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
user376327
  • 205
  • 1
  • 2
  • 8
  • For on-demand, DynamoDB can spike back up to that table's previous high water mark instantly. If you peak up higher than that, DynamoDB has to create new partitions for this added load. That can throttle with on-demand. When the spike happens, could it be to heights that table has not been before? If might be worthwhile to force a partitioning by flipping to provisioned mode, temporarily putting the RCU/WCUs up to your highest heights +50%, let the partitioning happen, then drop it back down and flip it back to on-demand 24 hours later. Your tables are now pre-partitioned. – NoSQLKnowHow Mar 23 '19 at 05:53
  • Also, I recommend calling AWS Support and/or your account manager or SA about your situation and see if they can get help on this. This situation has enough variables where will be difficult to troubleshoot over SO. – NoSQLKnowHow Mar 23 '19 at 05:54
  • @Kirk Thanks for the follow up. A similar "pre-warming" approach and side test is one I had considered with partitioned and then switching over to On-Demand. Should I get the time to execute that and them monitor behavior with a controlled recreation of a burst, I will post back what I see. – user376327 Mar 23 '19 at 14:06
  • On more follow up, we did open a ticket at the same time I posted this as another source for possible feedback. I will post any valuable information or answer, should it be obtained from support and that ticket. – user376327 Mar 23 '19 at 14:08
  • @user376327 It looks like a gap in DAX's exception translation. Can you ask support to push the case to the DAX team? I don't see it in our support queue yet. – Jeff Hardy Mar 25 '19 at 15:59
  • @Jeff Hardy I have opened up a ticket, but I am unaware if I can give you that case ID here. If there is a way I can tag you on the AWS dashboard to see the case from there or bring it to your attention on that side, I am more than welcome to do so. – user376327 Mar 25 '19 at 16:38
  • @user376327 No, don't put anything here - just add a note (if you can) asking the support engineer to make sure it goes directly to the DAX team. – Jeff Hardy Mar 25 '19 at 17:08

0 Answers0