I have installed kafka locally (no cluster/schema registry for now) and trying to produce an Avro topic and below is the schema associated with that topic.
{
"type" : "record",
"name" : "Customer",
"namespace" : "com.example.Customer",
"doc" : "Class: Customer",
"fields" : [ {
"name" : "name",
"type" : "string",
"doc" : "Variable: Customer Name"
}, {
"name" : "salary",
"type" : "double",
"doc" : "Variable: Customer Salary"
} ]
}
I would like to create a simple SparkProducerApi
to create some data based on the above schema and publish it to kafka.
Thinking of creating sample data converting to dataframe
and then change it to avro
and then publish it.
val df = spark.createDataFrame(<<data>>)
And then, something like below:
df.write
.format("kafka")
.option("kafka.bootstrap.servers","localhost:9092")
.option("topic","customer_avro_topic")
.save()
}
Attaching schema to this avro topic can be done manually
for now.
Can this be done just by using Apache Spark APIs
instead of using Java/Kafka Apis
? This is for batch processing instead of streaming
.