I need to process data from an AWS Kinesis stream, which collects events from devices. Processing function has to be called each second with all events received during the last 10 seconds.
Say, I have two devices A and B that write events into the stream. My procedure has name of MyFunction and takes the following params:
- DeviceId
- Array of data for a period
If I start processing at 10:00:00 (and already have accumulated events for devices A and B for the last 10 seconds) then I need to make two calls:
- MyFunction(А, {Events for device A from 09:59:50 to 10:00:00})
- MyFunction(B, {Events for device B from 09:59:50 to 10:00:00})
In the next second, at 10:00:01
- MyFunction(А, {Events for device A from 09:59:51 to 10:00:01})
- MyFunction(B, {Events for device B from 09:59:51 to 10:00:01})
and so on.
Looks like the most simple way to accumulate all the data received from devices is just store it memory in a temp buffer (the last 10 seconds only, of course), so I'd like to try this first.
And the most convenient way to keep such a memory based buffer I have found is to create a Java Kinesis Client Library (KCL) based application.
I have also considered AWS Lambda based solution, but looks like it's impossible to keep data in memory for lambda. Another option for Lambda is to have 2 functions, the first one has to write all the data into DynamoDB, and the second one to be called each second to process data fetched from db, not from memory. (So this option is much more complicated)
So my questions is: what can be other options to implement such processing?