0

trying to gather ideas on how to design a system using only AWS resources.

Basically, I have a file in S3 (tsv for the sake of the argument) with a variable number of records (1000s to millions)

For each records (row) I will need to hit an API endpoint and get a response, which I will would like to save back to S3.

I know I can use Kinesis (data streams, firehose) to do this in a streaming way.

Any experts out there who can shed some light on a better/different approach to this problem? Maybe there's a design pattern im unaware of.

Alexander
  • 339
  • 3
  • 13
  • Could you clarify what's wrong with the Kinesis approach, and why you think its not well suited for this scenerio? – Marcin Aug 19 '20 at 07:39
  • I guess nothing is inherently wrong with the Kinesis approach! Just would like to confirm it as being the ideal approach or if anyone would have a different suggestion. Thank you – Alexander Aug 19 '20 at 19:36
  • Kinesis sounds good. You could also consider putting individual rows of your CSV into SQS and then to lambda to trigger the endpoint for each row. This could also work if order is not important, and maybe would be easier to use than Kinesis. – Marcin Aug 19 '20 at 21:40
  • 1
    I like that idea better of using SQS (rather than kinesis) order does not matter. Then, like you said, I can have each message trigger a lambda and endpoint and save the result to a DynamoDB, which i can later query and output the batch to S3 as one – Alexander Aug 20 '20 at 18:47
  • Yes. SQS would be easier to work with as you don't have to worry about provisioning throughput for Kinesis Streams and managing partitioning. – Marcin Aug 20 '20 at 21:08

0 Answers0