1

So I have below rows in DynamoDB with example data. I want to find out the count in below format. Currently I am doing query and pagination to achieve this but its terribly slow due to huge number of rows in millions. Is there any other faster way to do it since I need to find only the count and not individual items.

Example Data

BrandName BrandCode Eventid
ABC       123       30100
ABC       123       30111
XYZ       456       30100
XYZ       456       30111

OUTPUT

Number of events : 2

Above since there are only 2 types of events based on the eventid. I want their count as 2

Note : The main intent of the application is to store the events that come from external system. We just want the above as an audit to check what count of events were consumed and what were persisted.

ghostrider
  • 2,046
  • 3
  • 23
  • 46

1 Answers1

2

To achieve this you will need to use DynamoDB Streams and a Lambda window function.

Essentially you stream all of the item modifications to Lambda, which listens to INSERT and REMOVE events. Set Lambda window to 1min for example, inside the Lambda the code will sum the individual counts and write back to a single item in DynamoDB. So now instead of running a paginated Query you simply have to do a GetItem. Of course its eventually consistent, depending on the time window for Lambda.

This explains a very similar concept.

This image also depicts something similar where counts for a voting candidate are summed and written back to an item storing the total.

enter image description here

Leeroy Hannigan
  • 11,409
  • 3
  • 14
  • 31
  • Thanks!! Why do we need a window. Cant it be real time as in the moment there is a modification in db, lambda will be triggered? – ghostrider Feb 16 '23 at 12:01
  • 1
    It can be real time, the idea of the window is to avoid a hot key. You are essentially capped to 1000 WCU per second per item. So if you are counting 100 items which share the same PK, you have a max throughput of 10WCU per item, 10*100 == 1000 which is how often you'll need write to the `total` item. – Leeroy Hannigan Feb 16 '23 at 12:58
  • ok , I had one doubt with this approach. So suppose due to scaling and concurrency 4 lambda instances are running and they will be updating the same count value, there will be concurrency issues right? stale data etc? – ghostrider Feb 20 '23 at 11:54
  • What sort of concurrency issues are you expecting? All single item updates are atomic. You are essentially updating an items value like `SET counter = counter + 50`. All updates to the item are strongly serialized. – Leeroy Hannigan Feb 20 '23 at 12:39