0

Is there a way in pysark to use the avg calculated value to filter data from a main stream ?

I have this code to calculate average of heartbeat by lap.

df = spark.readStream.format("csv").schema(schema).option("header",True).load("/content/input")

# This is the part that interests you
avg_heartbeat_rate_per_lap = df \
    .withWatermark("timestamp", "10 minutes") \
    .groupBy(
        session_window(df.timestamp, "5 minutes"),
        df.lapId) \
    .agg(avg("heartbeat"))

The code from Question to calculate avg by lap

Can I do something like that, to filter value above average and save them to a database.

df = df.where(df["heartbeat"] > avg_heartbeat_rate_per_lap.tail(1)["avg(heartbeat)"])

This line of code do not work, but looking for similar solution.

Koedlt
  • 4,286
  • 8
  • 15
  • 33
MrBigData
  • 3
  • 6

0 Answers0