The setup is pretty simple:
- one backend service (e.g. Java)
- one lambda (e.g. JavaScript)
- one SQS
backend service sends a message to SQS -> SQS triggers lambda -> lambda code is executed
I added a log right after the backend service sent the message to SQS.
Then I viewed the logs for the lambda using CloudWatch.
To my surprise there is a gap of around 10 seconds between the backend service log and first lambda log, which means that SQS took around 10s to receive, process and send the message out again.
Question
Is it normal that SQS takes around 10s to process an incoming message? If yes, can it be made faster?
UPDATE
Thanks to the comment I was able to track this down a bit more. In my case we use TypeScript CDK and the lambda is created with following SqsEventSource
:
import eventSources = require("@aws-cdk/aws-lambda-event-sources")
const eventSource = new eventSources.SqsEventSource(
sqs.Queue.fromQueueAttributes(scope, "SomeId", {
queueArn: "someArn",
queueName: "someQueueName",
}),
{
batchSize: 100,
maxBatchingWindow: cdk.Duration.seconds(5),
},
)
As far as I understand
batchSize: 100
andmaxBatchingWindow: cdk.Duration.seconds(5)
together essentially mean: "Execute lambda immediately if there are 100 events or at latest after 5s".
So 10s is still 5s too much (it sometimes even takes 15-20s).
Is this a bug?
UPDATE 2
To make the hole process more testable I decided to directly send the message to the queue via CLI (so the backend service is not relevant from here anymore):
aws sqs --endpoint <endpoint> send-message --queue-url <queue-url> --message-body '{"test":"test1"}'
I executed the above command 10 times in a row (it took max. 7s to execute all of them) and then checked the lambda logs in CloudWatch:
(I could check how many messages were processed by each request with application logs, which are not visible here)
So as you can see there is a gap of 16s between the first and second lambda execution (the rest looks fine). Since I sent all 10 SQS message within max. 7 seconds, that should never be possible. There should be a gap of maximum 5 seconds or worst 10s.
What could be the cause here? Is this a bug?
Side note:
I actually came across this because a system test of my application was failing since it was so slow. For the system test it happens like 50% of the time that it's slow and the other 50% it's working as expected.