I have a structured stream dataframe tempDataFrame2
consisting of Field1
. I am trying to calculate the approxQuantile of Field1
. However, whenever I type
val Array(Q1, Q3) = tempDataFrame2.stat.approxQuantile("Field1", Array(0.25, 0.75), 0.0)
I get the following error message:
Queries with streaming sources must be executed with writeStream.start()
Below is the code snippet:
val tempDataFrame2 = A structured streaming dataframe
// Calculate IQR
val Array(Q1, Q3) = tempDataFrame2.stat.approxQuantile("Field1", Array(0.25, 0.75), 0.0)
// Filter messages
val tempDataFrame3 = tempDataFrame2.filter("Some working filter")
val query = tempDataFrame2.writeStream.outputMode("append").queryName("table").format("console").start()
query.awaitTermination()
I have already went through this two links from SO: Link1 Link2. Unfortunately, I am not able to relate those responses with my problem.
Edit
After reading the comments, following is the way I am planning to go ahead with:
1) Read all the uncommitted offset from the Kafka topic. 2) Save them to a dataframe variable. 3) Stop the structured streaming so that I don't read from the Kafka topic anymore. 4) Start processing the saved dataframe from step 2).
But, now I am not sure how to go ahead -
1) like how to know that I don't have any other records to consume in the Kafka topic and stop the streaming query?