I will explain our use-case and our current approach (it might be inefficient) and the issue we have with our implementation.
Use case
- We have a lot of data coming into MQTT topics (IoT Core) and we need to process each record (append some information from the topic name to the underlying data from MQTT and MAYBE add to dynamoDB).
- Not all records need to be written to dynamoDB (depending on where the information is coming from, for example, in dynamoDB we might have sensor 1 "disabled" so all records coming from this sensor needs to NOT be processed).
Implementation
- All records from the MQTT topic will call a lambda which will process it in order to write it to the appropriate FIFO SQS queue
- The records on the SQS will trigger another lambda which in turn will call the dynamoDB ONCE to check if any of these record needs to be further processed in order to save it to dynamoDB
Problem
From step 2 on the implementation, we are aware that the batch limit is 10 for a FIFO SQS. But because for every record we have we need to make a GET request on dynamoDB we want to make the GET request every 10 records, not every 1 record in the queue. Currently, every new record in the SQS will fire the lambda which will make a GET request.
Is there any way we can "wait" or "ensure" we have the batch_limit full BEFORE triggering the lambda ?
I will illustrate it further with some dummy code/logic
dynamoDB = [{sensorID: 1, enabled: true}, {sensorID: 2, enabled: false}]
mqttData = [
{sensorID: 1, data: "on"}, {sensorID: 1, data: "on"},
{sensorID: 1, data: "off"}, {sensorID: 1, data: "on"},
{sensorID: 1, data: "off"}, {sensorID: 2, data: "on"},
{sensorID: 2, data: "on"}, {sensorID: 2, data: "off"},
{sensorID: 2, data: "on"}, {sensorID: 2, data: "off"},
]
We have 10 records on the MQTT topic, 5 records from sensor1
and 5 records from sensor2
. If the lambda processes each record separately it needs to make 10 dynamoDB requests to get the config for each sensor, whereas if we could process all 10 records in one go in the lambda, we would extract there are 2 unique sensors (1 & 2) and fetch the config for those 2 sensors and simply use that logic for the remaining records.
Thank you for your time.