Is there a way in pysark to use the avg calculated value to filter data from a main stream ?
I have this code to calculate average of heartbeat by lap.
df = spark.readStream.format("csv").schema(schema).option("header",True).load("/content/input")
# This is the part that interests you
avg_heartbeat_rate_per_lap = df \
.withWatermark("timestamp", "10 minutes") \
.groupBy(
session_window(df.timestamp, "5 minutes"),
df.lapId) \
.agg(avg("heartbeat"))
The code from Question to calculate avg by lap
Can I do something like that, to filter value above average and save them to a database.
df = df.where(df["heartbeat"] > avg_heartbeat_rate_per_lap.tail(1)["avg(heartbeat)"])
This line of code do not work, but looking for similar solution.