I need to execute some functions based on the values that I receive from topics. I'm currently using ForeachWriter to convert all the topics to a List. Now, I want to pass this List as a parameter to the methods.
This is what I have so far
def doA(mylist: List[String]) = { //something for A }
def doB(mylist: List[String]) = { //something for B }
Ans this is how I call my streaming queries
//{"s":"a","v":"2"}
//{"s":"b","v":"3"}
val readTopics = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "localhost:9092").option("subscribe", "myTopic").load()
val schema = new StructType()
.add("s",StringType)
.add("v",StringType)
val parseStringDF = readTopics.selectExpr("CAST(value AS STRING)")
val parseDF = parseStringDF.select(from_json(col("value"), schema).as("data"))
.select("data.*")
parseDF.writeStream
.format("console")
.outputMode("append")
.start()
//fails here
val listOfTopics = parseDF.select("s").map(row => (row.getString(0))).collect.toList
//unable to call the below methods
for (t <- listOfTopics ){
if(t == "a")
doA(listOfTopics)
else if (t == "b")
doB(listOfTopics)
else
println("do nothing")
}
spark.streams.awaitAnyTermination()
Questions:
- How can I call a stand-alone (non-streaming) method in a streaming job?
- I cannot use ForeachWriter here as I want to pass a SparkSession to methods and since SparkSession is not serializable, I cannot use ForeachWriter. What are the alternatives to call the methods doA and doB in parallel?