0

I have a database consisting of many items. At least one time per day I want to loop through all items in this database and for each item, call an external API to fetch current data about this item and store it in my database.

With this scenario in mind, I was thinking of using Lambda, DynamoDB and SNS in the following way:

  • Scheduled Lambda (worker) that loops through all items in the DynamoDB
  • For each item, publish to a SNS topic with details about that specific item
  • Another Lambda (consumer/processer) listens to that SNS topic to get each item
  • For each item received, perform request to external API and update the item in DynamoDB

This setup should be scalable, easy to use/configure/maintain and hopefully cost efficient as well. But will it handle if the DynamoDB has 1000+ items to loop through at least once every day? Is there fault tolerance with this setup? Will it handle if I want to trigger this more than once every day, and more importantly; will it still be cost efficient if triggered say once every hour? Is there a better way of doing this?

Somehow I feel I should use SQS but maybe it's not really useful when running serverless since you can't poll the queue to fetch new items to process?

Tanax
  • 1
  • I'm not a lambda expert, but lambda seems appropriate and cost effective no matter how often you want to run this. Maximum lambda run time is 5 minutes, so one master process that queues everything then many lambda workers seems appropriate. Queue pattern based on a timer [here](https://cloudonaut.io/integrate-sqs-and-lambda-serverless-architecture-for-asynchronous-workloads/). Another option could be a DynamoDB property / control table that keeps state, but consider DynamoDB costs. – Tim May 20 '18 at 19:06
  • Thank you @Tim for affirming my setup :) Regarding SQS, what would be the advantage of adding it into the mix? Since it's not working out of the box with serverless setups like Lambda and I'm not sure if that extra trouble brings me something substantial? Also, is there any other solution you would recommend over DynamoDB instead? Thanks! – Tanax May 20 '18 at 19:52
  • I misread, though you said SQS. SNS is supported by Lambda which is easier, though logically SQS might better you can't easily trigger Lambda with SQS, – Tim May 20 '18 at 20:42
  • Thank you @Tim. I agree, SNS just makes it a lot easier to use with Lambda than SQS. Is there anything I will "lose" by going with SNS rather than SQS in this case? And is Dynamo still the best option to use in this case? Thanks :) – Tanax May 21 '18 at 16:09
  • SQS vs SNS there could be something subtle that will bite you later. How long do you expect your processing to take? There's a design option of 1) master process that queues / notifies each item for another lambda to process or 2) a process that runs say every 5 minutes, checks the last processing time, and processes what it can within some timeout period. Option 2 removes the need for SQS/SNS. Dynamo is generally good with Lambda, just make sure you consider the cost implications as you pay for peak traffic not average traffic - so spread your processing out. – Tim May 21 '18 at 20:17
  • 1
    SQS is now supported on Lambda apparently so I guess I can just use SQS now instead :) https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/ – Tanax Jun 29 '18 at 08:27

0 Answers0