I'm trying to calculate moving average for the last 24h of events. I'm struggling to understand how to implement the "moving" part of the calculation with Apache Beam.
My scenario is as follows:
- Given unbound stream of events where each event has
user
andvalue
fields. - After I group the events by
user
to get per-user streams. - Calculate the average per-user
value
over the last 24h. If I do it by hand:
- Take current time
- Find events where the event time is
>= now - 24h
- For these events average the
value
field to get a single number.
- Sink the calculated per-user average
value
to a database table.
The moving aspect here is that when an event expires (clock ticks forward and event time becomes < now - 24h
) the average value should be re-calucated.
What I tried:
- It isn't a
FixedWindow
because it's moving. I don't want to know what is the daily averagevalue
for given day. I want to know what is the current averagevalue
within last 24h. - Since the user generates events randomly documentation suggests this could be a use case for
SessionWindow
but I'm not interested in understanding user behaviour or finding sessions. Not sure ifSessionWindow
fits because it's started by event time and I'm looking for fixed clock time boundaries.
Can someone please explain how this moving average should be implemented in Beam?
I'm new to Apache Beam, just started. I look at Windowing and Trigger documentation and review the leader board example. So far I haven't found an example for calculating moving values.