I am trying to write a Spark DF (batch DF) to Kafka and i need to write the data to specific partitions.
I tried the following code
myDF.write
.format("kafka")
.option("kafka.bootstrap.servers", kafkaProps.getBootstrapServers)
.option("kafka.security.protocol", "SSL")
.option("kafka.truststore.location", kafkaProps.getTrustStoreLocation)
.option("kafka.truststore.password", kafkaProps.getTrustStorePassword)
.option("kafka.keystore.location", kafkaProps.getKeyStoreLocation)
.option("kafka.keystore.password", kafkaProps.getKeyStorePassword)
.option("kafka.partitioner.class", "util.MyCustomPartitioner")
.option("topic",kafkaProps.getTopicName)
.save()
And the Schema of the DF i am writing is
+---+---------+-----+
|key|partition|value|
+---+---------+-----+
+---+---------+-----+
I had to repartition (to 1 partition) the "myDF" since i need to order the data based on date column.
It is writing the data to a Single partition but not the one that is in the DF's "partition" column or the one returned by the Custom Partitioner (which is same as the value in the partition column).
Thanks Sateesh