Kafka - consuming from spark

Question

I followed this document, and It works well. Now I tried to consuming connector data from spark. Is there any reference I can use? Since I use confluent, It's much different from original kafka reference document.

It's some code that I've used so far. The problem is cannot convert record data to java.String. (and not sure that it's right way to consume)

val brokers = "http://127.0.0.1:9092"
val topics = List("postgres-accounts2")
val sparkConf = new SparkConf().setAppName("KafkaWordCount")
//sparkConf.setMaster("spark://sda1:7077,sda2:7077")
sparkConf.setMaster("local[2]")
sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") 
sparkConf.registerKryoClasses(Array(classOf[org.apache.avro.generic.GenericData$Record]))

val ssc = new StreamingContext(sparkConf, Seconds(2))
ssc.checkpoint("checkpoint")


 // Create direct kafka stream with brokers and topics
//val topicsSet = topics.split(",")

val kafkaParams = Map[String, Object](
  "schema.registry.url" -> "http://127.0.0.1:8081",
  "bootstrap.servers" -> "http://127.0.0.1:9092",
  "key.deserializer" -> "io.confluent.kafka.serializers.KafkaAvroDeserializer",
   "value.deserializer" -> "io.confluent.kafka.serializers.KafkaAvroDeserializer",
  "group.id" -> "use_a_separate_group_id_for_each_stream",
  "auto.offset.reset" -> "earliest",
  "enable.auto.commit" -> (false: java.lang.Boolean)
)

val messages = KafkaUtils.createDirectStream[String, String](
  ssc,
  PreferConsistent,
  Subscribe[String, String](topics, kafkaParams)
)

val data = messages.map(record => {
    println( record) 
    println( "value : " + record.value().toString() ) // error  java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to java.lang.String
    //println( Json.parse( record.value() + ""))

    (record.key, record.value)
})

As you seem to be at the beginning of your work with Spark and its streaming capabilities, let me ask you why you use Spark Streaming not Spark Structured Streaming? See http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html — Jacek Laskowski, Nov 13 '17 at 10:46
Since I used jdbc connector from confluent, I think It should be structured straming. isn't it? — J.Done, Nov 14 '17 at 00:15
No idea about the jdbc connector from Confluent, but I'm sure you should be using Spark Structured Streaming as the streaming solution in Spark. — Jacek Laskowski, Nov 14 '17 at 05:29
Could you then answer your own question and approve? I'm looking forward to seeing the change. — Jacek Laskowski, Nov 14 '17 at 08:07

score 0 · Accepted Answer · answered Nov 14 '17 at 08:09

0

Do sync my value deserializer to below. It will provide proper function and type.

KafkaUtils.createDirectStream[String, record]

answered Nov 14 '17 at 08:09

J.Done

2,783
9
31
58

There is a new library for handling Avro in Spark Streaming https://github.com/AbsaOSS/ABRiS – OneCricketeer Sep 30 '18 at 23:13

Kafka - consuming from spark

1 Answers1