SQS FIFO to trigger lambda only after 10 messages in the queue

Question

I will explain our use-case and our current approach (it might be inefficient) and the issue we have with our implementation.

Use case

We have a lot of data coming into MQTT topics (IoT Core) and we need to process each record (append some information from the topic name to the underlying data from MQTT and MAYBE add to dynamoDB).
Not all records need to be written to dynamoDB (depending on where the information is coming from, for example, in dynamoDB we might have sensor 1 "disabled" so all records coming from this sensor needs to NOT be processed).

Implementation

All records from the MQTT topic will call a lambda which will process it in order to write it to the appropriate FIFO SQS queue
The records on the SQS will trigger another lambda which in turn will call the dynamoDB ONCE to check if any of these record needs to be further processed in order to save it to dynamoDB

Problem

From step 2 on the implementation, we are aware that the batch limit is 10 for a FIFO SQS. But because for every record we have we need to make a GET request on dynamoDB we want to make the GET request every 10 records, not every 1 record in the queue. Currently, every new record in the SQS will fire the lambda which will make a GET request.

Is there any way we can "wait" or "ensure" we have the batch_limit full BEFORE triggering the lambda ?

I will illustrate it further with some dummy code/logic

dynamoDB = [{sensorID: 1, enabled: true}, {sensorID: 2, enabled: false}]

mqttData = [
     {sensorID: 1, data: "on"}, {sensorID: 1, data: "on"}, 
     {sensorID: 1, data: "off"}, {sensorID: 1, data: "on"}, 
     {sensorID: 1, data: "off"}, {sensorID: 2, data: "on"}, 
     {sensorID: 2, data: "on"}, {sensorID: 2, data: "off"}, 
     {sensorID: 2, data: "on"}, {sensorID: 2, data: "off"},
   ]

We have 10 records on the MQTT topic, 5 records from sensor1 and 5 records from sensor2. If the lambda processes each record separately it needs to make 10 dynamoDB requests to get the config for each sensor, whereas if we could process all 10 records in one go in the lambda, we would extract there are 2 unique sensors (1 & 2) and fetch the config for those 2 sensors and simply use that logic for the remaining records.

Thank you for your time.

No there is no such way. You have to develop a fully custom solution. — Marcin, Apr 27 '23 at 00:04
You have caching options with DAX and your Lambda function can potentially cache items that can be reused on the next, warm function invocation. — jarmod, Apr 27 '23 at 00:13
if i cache during invocations, how would it get invalidated when the data from dynamoDB gets updated ? — D3V, Apr 27 '23 at 00:20
First of all I've assumed that reducing the number of requests that hit DynamoDB is effectively your goal but if it isn't then please say what is. If you have zero tolerance for potentially stale cache items then DAX is a readthrough/writethrough cache. — jarmod, Apr 27 '23 at 00:52

score 0 · Answer 1 · answered Apr 28 '23 at 11:59

I don't know the optimal solution but maybe you can set Batch window to 10 seconds.

Also there is another trick about lambda, if you write data to memory and the lambda doesn't die, memory is usable for next execution of lambda so you can save lastDynameDbCheckDate at the beggining of the function, you can reach it next execution.

if(now() - lastDynameDbCheckDate >10 sec)
    checkDynamoDb()
    lastDynameDbCheckDate = now()

In this way you request less to dynamodb

Does AWS Lambda reset memory on each invocation?

SQS FIFO to trigger lambda only after 10 messages in the queue

1 Answers1