Spark Structured Streaming from Kafka to Elastic Search

Question

I want to write a Spark Streaming Job from Kafka to Elasticsearch. Here I want to detect the schema dynamically while reading it from Kafka.

Can you help me to do that.?

I know, this can be done in Spark Batch Processing via below line.

val schema = spark.read.json(dfKafkaPayload.select("value").as[String]).schema

But while executing the same via Spark Streaming Job, we cannot do the above since streaming can have only on Action.

Please let me know.

score 1 · Answer 1 · answered Dec 15 '21 at 14:54

1

If you are listening from kafka topic you can not rely on spark to automaticly infer json schema since it will take a lot of time. So somehow you need to provide your schema to your application.

If you are listening from file source you can do that though.

'spark.sql.streaming.schemaInference', 'true'

answered Dec 15 '21 at 14:54

Enes Uğuroğlu

377
5
16

Question states data is from Kafka source, not a file. Kafka source is always bytes – OneCricketeer Dec 15 '21 at 14:59
Hello, OneCricketeer, I already post my answer about kafka source, that was only additional information for adhoc type of schema inference :) – Enes Uğuroğlu Dec 15 '21 at 15:09
1

Sorry, got confused when answer included file source... In any case, I think the true answer is to not put "dynamic" json into a Kafka topic at all; the schema of the producer should remain consistent – OneCricketeer Dec 15 '21 at 15:13

Spark Structured Streaming from Kafka to Elastic Search

1 Answers1