I am going through Spark Structured Streaming and encountered a problem.
In StreamingContext, DStreams, we can define a batch interval as follows :
from pyspark.streaming import StreamingContext
ssc = StreamingContext(sc, 5) # 5 second batch interval
How to do this in Structured Streaming?
My streaming is something like :
sparkStreaming = SparkSession \
.builder \
.appName("StreamExample1") \
.getOrCreate()
stream_df = sparkStreaming.readStream.schema("col0 STRING, col1 INTEGER").option("maxFilesPerTrigger", 1).\
csv("C:/sparkStream")
sql1 = stream_df.groupBy("col0").sum("col1")
query = sql1.writeStream.queryName("stream1").outputMode("complete").format("memory").start()
This code is working as expected but, how to/where to define the batch interval here?
I am new to Structured Streaming, please guide me.