I'm currently struggling with the following:
z-score is defined as:
z = (x-u)/sd
(where x is the individual value, u the mean of the window and sd the standard deviation of the window)
I can calculate u and sd on the window but don't know how to "carry over" each individual x value to the resulting dataframe in order to calculate the z-score for every value, this is how far I got so far:
val df = spark.readStream
.format("org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider")
.option("topic", "topic/path")
.load("tcp://localhost:1883")
val counter = df.groupBy(
window($"timestamp", "2 seconds"),
$"value")
.agg($"value",avg($"value")+stddev($"value"))
val query = counter.writeStream
.outputMode("complete")
.format("console")
.start()
My hope was that $"value" in .agg($"value",avg($"value")+stddev($"value")) would carry over each value from the source data frame to the result, but this is not the case
Any ideas?