3

I am trying to run the spark streaming application with Kafka using yarn. I am getting the following Stack trace error-

Caused by: org.apache.kafka.common.config.ConfigException: Missing required configuration "partition.assignment.strategy" which has no default value. at org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:124) at org.apache.kafka.common.config.AbstractConfig.(AbstractConfig.java:48) at org.apache.kafka.clients.consumer.ConsumerConfig.(ConsumerConfig.java:194) at org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:380) at org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:363) at org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:350) at org.apache.spark.streaming.kafka010.CachedKafkaConsumer.(CachedKafkaConsumer.scala:45) at org.apache.spark.streaming.kafka010.CachedKafkaConsumer$.get(CachedKafkaConsumer.scala:194) at org.apache.spark.streaming.kafka010.KafkaRDDIterator.(KafkaRDD.scala:252) at org.apache.spark.streaming.kafka010.KafkaRDD.compute(KafkaRDD.scala:212) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)

Here is snippet of my code of how I am creating my KafkaStream with spark stream-

        val ssc = new StreamingContext(sc, Seconds(60))

val kafkaParams = Map[String, Object](
  "bootstrap.servers" -> "*boorstrap_url:port*",
  "security.protocol" -> "SASL_PLAINTEXT",
  "sasl.kerberos.service.name" -> "kafka",
  "key.deserializer" -> classOf[StringDeserializer],
  "value.deserializer" -> classOf[StringDeserializer],
  "group.id" -> "annotation-test",
  //Tried commenting and uncommenting this property      
  //"partition.assignment.strategy"->"org.apache.kafka.clients.consumer.RangeAssignor",
  "auto.offset.reset" -> "earliest",
  "enable.auto.commit" -> (false: java.lang.Boolean))

val topics = Array("*topic-name*")

val kafkaStream = KafkaUtils.createDirectStream[String, String](
  ssc,
  PreferConsistent,
  Subscribe[String, String](topics, kafkaParams))
val valueKafka = kafkaStream.map(record => record.value())

I have gone through the following post -

  1. https://issues.apache.org/jira/browse/KAFKA-4547
  2. Pyspark Structured Streaming Kafka configuration error

According to this I have updated my kafka util jar in my fat jar to 0.10.2.0 version from 0.10.1.0 packaged by default from spark-stream-kafka-jar as transient dependency. Also my job is working fine when I am running it on single node by setting master as local. I am running spark 2.3.1 version.

Y0gesh Gupta
  • 2,184
  • 5
  • 40
  • 56
  • can you try changing your strategy to "org.apache.kafka.clients.consumer.RoundRobinAssignor" or instead of "partition.assignment.strategy" try setting "consumer.partition.assignment.strategy" – suraj_fale Mar 13 '19 at 15:32
  • I tried it then I am getting the error- java.lang.NoSuchMethodError: org.apache.kafka.clients.consumer.KafkaConsumer.assign(Ljava/util/Collection;)V – Y0gesh Gupta Mar 13 '19 at 15:34
  • try setting "consumer.partition.assignment.strategy" instead of ""partition.assignment.strategy". – suraj_fale Mar 13 '19 at 15:35
  • I am getting the same error- Missing required configuration "partition.assignment.strategy" which has no default value. – Y0gesh Gupta Mar 13 '19 at 15:52
  • I think that question might be useful: https://stackoverflow.com/questions/43035542/spark-kafka-0-10-nosuchmethoderror-org-apache-kafka-clients-consumer-kafkaconsum – Bartosz Wardziński Mar 13 '19 at 16:31
  • Yes, I tried that but I do not have any other kafka client in my environment. I am also submitting the spark job with --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true to check for class in my fat jar first. – Y0gesh Gupta Mar 13 '19 at 17:16

1 Answers1

0

Add kafka-clients-*.jar to your spark jar folder. kafka-clients-*.jar is in kafka-*/lib directory.

double-beep
  • 5,031
  • 17
  • 33
  • 41