Gurus - I am new to Apache Beam and trying to implement, what seems to be a pretty straight forward use case. I have stock data and I need to find a rolling average-price of the stock over the past 10 transactions.
Now since there is no fixed time duration within which 10 transactions can occur (some times it may be a few milli-seconds and other times it may be several seconds), I don't think I can use the time based Windowing. I had two questions:
- Is this a valid use case for Beam or am I missing a point here?
- Is there a reasonably simple/legitimate/non-hack way to write a Windowing functions/class (in python sdk) that can window data based on number of records?
I have seen recommendations of faking timestamp data on the records so that each arriving record seems like it was created say one second apart but I see two problems with this:
a. This is truly a hack solution which seems such a misfit for something like beam that is supposed to be so powerful and elegantly architectured
b. What is the point of using high-performance Beam pipeline(server-less) if you are going to stifle the performance in the first place by using a program to sequentially add the fake-time stamps
Wonder if windowing within Beam may be a more elegant solution