I am currently working on the use case, where I am writing the pyspark dataframe to confluent-kafka topic.
def write_data(rows):
rows.selectExpr("to_json(struct(*)) AS value") \
.write \
.format("kafka") \
.option("kafka.bootstrap.servers", "xxx.aws.confluent.cloud:9092") \
.option("topic", "test_topic") \
.save()
dataframe.foreachPartition(write_data)
Below is the error that I'm getting.
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
File "<command-2315>", line 32, in write_data
AttributeError: 'itertools.chain' object has no attribute 'selectExpr'
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:514)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:650)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:633)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:468)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
Auth which is enabled on the topic, is `SASL PLAIN. I wanted to know, Is my approach of writing the dataframes into confluent-kafka topic is correct? Or do I need to add other configs as well.
I'm new to spark. Any help would be appreciated.