I want to apply a matching algorithm on the live data coming to kafka.I also initial load the table that should be checked and compared with current stream and need to check whether the current stream match with at-least one record of loaded table or not. The table is loaded in a dataframe. Also mention that the algorithm uses many filters and function to find matching. So my question is how to access outside of foreach function here:
df = spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers", "kafka:29092") \
.option("subscribe", "test1") \
.option("startingOffsets", "latest") \
.load()
result = df.selectExpr("CAST(value AS STRING)").writeStream.foreach(lambda row: match(initial_df, row)).format("console").start()
While I run this code, I get the error: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.