Does anyone know how to implement a sliding window using Faust?
The idea is to count the occurances of a key in a 10, 30, 60, and 300s window, but we need that on a 1s or on every update basis.
I have a dodgy workaround, which seems very inefficient where I have a tumbling 1s window with an expiry of 300s, then I sum all the old values in the table up to the current one using the delta()
method. It seems to cope ok with messages from 6 sources each running at 10 messages/s, but that's about the limit before we see lag. It's obviously a slow method that can't scale up, so the question is how to achieve this without the need for KSQL or setting up a Spark cluster as well as the Kafka cluster.We're trying to keep this simple if we can.
To complicate this, we would dearly love to have the same stats for the last 24 hours, 1 week, 1 month, and last 3 months... all on the fly. But perhaps we're just asking way too much without a dedicated process for each input.
Here's my dodgy code:
class AlarmCount(faust.Record, serializer='json'):
event_id: int
source_id: int
counts_10: int
counts_30: int
counts_60: int
counts_300: int
@app.agent(events_topic)
async def new_event(stream):
async for value in stream:
# calculate the count statistics
counts_10=0
counts_30=0
counts_60=0
counts_300=0
event_counts_table[value.global_id] += 1
for i in range(300):
if(i<=10):
counts_10+=event_counts_table[value.source_id].delta(i)
if(i<=30):
counts_30+=event_counts_table[value.source_id].delta(i)
if(i<=60):
counts_60+=event_counts_table[value.source_id].delta(i)
if(i<=300):
counts_300+=event_counts_table[value.source_id].delta(i)
await event_counts_topic.send(
value=EventCount(
event_id=value.event_id,
source_id=value.source_id,
counts_10=counts_10,
counts_30=counts_30,
counts_60=counts_60,
counts_300=counts_300
)
)