1

Summary

What's the best way to transform some data in DDB tables and send that data to other teams for processing using AWS services?

Details

Let's say I'm on the "customers" team, and I have a system where new customers can be created and the customer info is stored in DDB tables. This system has been around for a while so the existing DDB tables are fairly large, and the system is fairly active so new customers are frequently created.

I'm working on a project where I need to expose specific (not all) customer information to other teams when a new customer is created along with that specific information for all existing customers. For each customer, clients may or may not perform some operations based on the info. When the project is complete, clients will need to be able to

  1. Onboard with us and receive specific information for all existing customers
  2. Receive specific customer information as soon as new customers are created

For #2, I think I'm leaning towards publishing an event to an SNS topic each time a customer is created. Clients can subscribe to the SNS topic if they would like to receive this info on-demand.

I'm not sure what the best way to do #1 is, however.

Some Potential Options

If I could do anything, each time a client onboards with us I would create a dedicated SQS queue for them and then run some sort of job (a Lambda?) which goes through our DDB table and puts each customer info as an message in the queue. I'm aware that's now how SQS is designed to work and it's not scalable.

Another option could be to write something like a script, which can be triggered during onboarding and clients create an SQS queue and pass in the ID to the script which will then go through the customer DDB table and add each piece of info as a message to the queue.

robl
  • 74
  • 7

1 Answers1

1

I think, for your goal #2, the SNS-SQS fanout design is a good fit. If you have control over how messages are sent to SNS you can make use of SNS message attributes to filter the messages before sending them further to the SQS queues.

For your goal #1, to keep things simpler for the clients, you can indeed push the onboarding data to SQS (bypassing SNS), so there will be a single source of the customers' data. Some other benefits of this approach:

  • SQS works as a buffer, you and the clients decide at which speed to publish and consume independently;
  • you are only responsible for delivering the data to SQS, if the clients fail to consume it's their problem;
  • if a client fails to consume some messages, they stay in the queue or go to DLQ where they can be retrieved from later;

However, depending on how big the volumes are, you may want to remove SQS as the middleman to speed up the onboarding.

You may use Lambda to push data from your side, but keep in mind that Lambda's maximum execution time is 15 minutes. If you expect the process to take longer now or sometime in the future, consider using ECS one-off tasks (optionally with serverless Fargate launch type to have fewer things to worry about).

Anton
  • 1,793
  • 10
  • 20