Lambda scaling with SQS trigger

Question

I have defined an SQS trigger to my Lambda. Inside the Lambda, I am calling a 3rd party api which is based on tokens(250 tokens per minute). Initially I had defined a batch size of 250, batch window of 65s, but what happened was that the lambda worked concurrently to process the requests and the tokens got exhausted very fast.

Then after changing various values of batch size, window and concurrency, finally the process started working smooth with batch size 10, batch window 10 and reserved concurrency 7 but that time there were only 3,00,000 product ids in the queue. Yesterday when I pushed 4 million product ids to the queue, again the tokens started getting exhausted very fast. When I checked the logs, I found that Lambda is picking different number of messages at different intervals like sometimes for a minute it takes 200 messages and sometimes 400. This number differs every time.

What I want is that Lambda should pick only 250 messages from the queue in 1 minute no matter how many messages are there in the queue. How to do this?

How come it got 200 or 400 messages if you batch size is 10? — Marcin, Mar 19 '21 at 06:16
Yes that's the problem. I have uploaded the logs. You can check it. — nats, Mar 19 '21 at 06:23
@Marcin Those 200 and 400 were per minute, not batch size. And those values seem ok, if you set batch size and window both to 10. — Jens, Mar 19 '21 at 06:30
The max I can count is 350 messages = 7 concurrency * 10 messages in batch * 5 threads. But if your function runs less then 1 minute, then you can easlily go over 400. But anyway, the answer is that you can't control this fully. Its up to AWS how it pools and scales the pooling. — Marcin, Mar 19 '21 at 06:31
@Marcin So what should I change to get only 250 messages per minute? And this 5 threads of SQS remains constant everytime or this too changes? — nats, Mar 19 '21 at 06:38
Is this not exactly the same thing we've talked about extensively a couple of weeks ago in chat? [link to question](https://stackoverflow.com/questions/66397467/sqs-batching-for-lambda-trigger-doesnt-work-as-expected) — Maurice, Mar 19 '21 at 07:40
Yes @Maurice. With that approach also, the number of messages picked by lambda were different everytime. — nats, Mar 19 '21 at 09:02

score 2 · Answer 1 · answered Mar 19 '21 at 06:37

2

I don't think SQS is the right product for this kind of problem. What you are looking for is throttling and SQS is probably not the right tool for this.

For example. You set the batch size to 10 and window to 10. That does not mean what you think that it means.

You are telling SQS to batch a maximum of 10 items for a maximum of 10 seconds. But if SQS has 10 items after 1 second, it will trigger your Lambda.

Looking at your requirements, it looks like you are putting a lot more data into the queue, than you can read from it.

Considering this, I would propose you write that data to DynamoDB first and then have a job, triggered by EventBridge, that runs every minute and picks up exactly 250 items (or for whatever amount you have tokens) from DynamoDB and do the work.

In summary:

Put your items into SQS
Trigger Lambda A from SQS
Lambda A will write it to DynamoDB
Create EventBridge rule to trigger a Lambda B every 60 seconds
Lambda B reads n items from DynamoDB and processes them

answered Mar 19 '21 at 06:37

Jens

20,533
11
60
86

So what I am doing is that I am getting the product ids from RDS to SQS and then this lambda. – nats Mar 19 '21 at 07:10
If your data is already in RDS you probably can just use Lambda B and directly read from RDS if possible. Makes things easier. – Jens Mar 19 '21 at 07:11
Yes, it would make things easier but the only problem with this approach is that how will I update the limit and offset in every invocation so that it picks the next set of records. Even if at the end of 1st invocation, I update the limit and offset still when 2nd time it will trigger it will start from 0 again. – nats Mar 19 '21 at 07:37
@nats Yeah. There are probably different challenges to solve then. One option might be to create a temporary copy of that table and use that as source for your job. Then, when the Lambda processed 250 items, it deletes them from that temporary table. This repeats until no items are left and then the table can be deleted. This might be a pragmatic way to solve this. But for us here at SO, it is hard to find a proper solution for your problem with limited information. I guess at least you now know that you need something different ;) – Jens Mar 19 '21 at 08:02
Yes, now I know what to do. Thanks a lot. – nats Mar 19 '21 at 09:01
I tried a new approach to this problem. I picked the limit and offset from s3 file and then after lambda invocation is complete, I am updating that file in S3 so that next time it picks the updated limit and offset. In 1 invocation I fetched 250 product ids from the db directly into my lambda and then called the api for these product ids. But in 1 minute only 50 api calls were successful and then it gave lambda timeout error. Any suggestions on this? Is lambda approriate for this task. – nats Mar 20 '21 at 08:56
@nats You should create another SO question for this. But anyway: check your Lambdas timeout setting. By default it is 3 seconds, which in most cases is not enough to reliably send 250 requests. You can set it as high as 900 seconds, which should be plenty. Furthermore, if possible send those requests in parallel. Maybe not all 250, but 10-20. That will reduce runtime as well. Languages like Go offer easy to use concurrency mechanisms like Go routines. – Jens Mar 20 '21 at 11:29

Marcin · Accepted Answer · 2021-03-19T07:07:54.383

2

The long story short is that you don't have full control over how lambda pools SQS. This is made clear by AWS rep in SQS Lambda Trigger Polling Issue :

Since this is entirely dependent on the Lambda service, the polling mechanism cannot be controlled.

What's more, lambda uses five pooling threads:

the Lambda service will begin polling the SQS queue using five parallel long-polling connections.

So with your setup, you can get easily thusdands of pools a minute (depending on how long a function lasts):

7 concurrency * 10 messages in batch * 5 threads * 6 pools per minute = 2100 per minute

As the AWS rep writes, the only way to combat this issue is not to use SQS with lambda directly:

The only way to mitigate this is to disable the SQS triggers on the Lambda function.

edited Mar 19 '21 at 07:07

answered Mar 19 '21 at 06:39

Marcin

215,873
14
235
294

7 concurrency * 10 messages in batch * 5 threads * 6 pools per minute = 2100 per minute In this what is 6 pools per minute? – nats Mar 20 '21 at 09:22
@nats Its from batch window. You set it to 10 seconds, so you can get 6 batches in one minute – Marcin Mar 20 '21 at 09:26

Lambda scaling with SQS trigger

2 Answers2