AWS Lambda + DynamoDB handling larger amount of data timed out

Question

First of all: I am receiving about 50.000 products from a supplier via API. The API has no pagination and therefore sending all 50k products in one Get Request.

I tried to handle this by fetching and storing the data into DynamoDB by using a aws Lambda function.

Currently the Dynamo DB has an auto Scaling up to 25 Write Units. But the throttling of the Dynamo still runs high (up to 40-50). This results that the lambda function takes very long to execute and running out of the 15 minutes limit.

Thus the API has no pagination I need to give the lambda 1GB of memory..

I am now wondering whats the best way to go for my case ? Of course I could increase the Dynamo write unit Limit more and more. But I am looking for a cost effective way of handling this.

As programming Language I am using Golang. And yes aws-sdk v2 is used for all the dynamo things in code.

Hopefully someone here can help me out.

Not sure what you want to do with the products afterwards. But it might be easier to write them to a file in S3 instead of DynamoDB? Files might sound old school and not as fancy, but they do the job well. — Jens, Apr 09 '22 at 08:10
I want them to upload over an existing API into a sales system. This should be an scheduled Lambda which is running once a week. There will be more lambdas synchronizing stocks and prices. The target API has Limitations like only x write Requests a day. Therefore I need to make sure that only changed data is processed. So I need to store the informations in a database before applying them to the target api. — MIstudent, Apr 09 '22 at 17:04
Why not using both? First store the data into S3 and then trigger a Lambda (or a ECS task) to process this data and send them "slower" to DynamoDB, if the time, your are processing these messages, does not matter in any case? I know, DynamoDB could incredible expensive, but with this solution you can reach two goals: the 50k products are stored to AWS as fast as possible and you can process it afterwards in a cost efficient way. — Daniel Seichter, Apr 09 '22 at 20:09
another approach: split in the first step all messages into single messages and send them to SQS and this will trigger a Lambda -> advantage: Using retry, see the processing of the messages, having a DeadLetterQueue for further retries, etc. and by having less then one million records, it produces no (high) further costs. — Daniel Seichter, Apr 09 '22 at 20:12
The whole truth is, that it should not take hours to process the data. In case for stock updates it should not take hours until the new stocks are processed. And of course there will be more suppliers coming I want to process. Which means more writes to the database. Which results in higher costs. I really cannot realize that it will costs hundreds of dollars a montth only for the database. Maybe AWS is not the best solution for this project? @DanielSeichter — MIstudent, Apr 09 '22 at 21:13
How often you have read/write access to this database? If not regularly, you can also try to use RDS Aurora Serverless (0 costs if not used, autoscaling if lot of incoming requests) or take a look on DocumentDB (t3 or t4g), if you like/need the NoSQL approach. As long as I think about your needs, you can also try to use the classic RDS... a "normal" t3 instance can handle this workload and you can upscale, if needed. — Daniel Seichter, Apr 09 '22 at 21:27
The Access will be pretty regular. Sth. about each 1-2 hours. How about launching an ec2 instance with a psgql Database running in it ? That should be completly free. The instance can be easily secured by not allowing Internet access — MIstudent, Apr 10 '22 at 20:39

AWS Lambda + DynamoDB handling larger amount of data timed out

0 Answers0