1

we are using spark streaming to read and write from kafka and uses the KafkaUtils libary in spark-streaming_2.11 which has the kafka 0.10.0 libs. Right I am in the process of upgrading the kafka-client jars to 0.11 to use some feature but since spark-streaming has the kafka 0.10 its not getting used.I tried to exclude the same from spark-streaming but its complaining about below KafkaUtils class which is in 0.10 only. Even 2.3 is coming bundled with kafka0.10 (spark-streaming-kafka-0-10_2.11-2.3.0.cloudera2.jar) . How i can i get rid of this dependency of KAfkautils in 2.10 ?

SCALA CODE for spark streaming Direct Stream creation

    import org.apache.spark.streaming.kafka010.KafkaUtils
    val directKafkaStream = KafkaUtils.createDirectStream[String, String](
                            ssc, PreferConsistent, Subscribe[String, String](topicSet, 
                            conf.kafkaParams))

POM.XML

 <dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>0.10.0.0</version>
</dependency>

     <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
    <version>2.1.0.cloudera1</version>
    <scope>provided</scope>            
</dependency>
Ajith Kannan
  • 812
  • 1
  • 8
  • 30
  • 1
    What is the spark kafka dependency that you are using for migrating to kafka broker 0.11. Also, as per the official doc, you should not be importing org.apache.kafka dependencies yourself. https://spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html – Rajan Prasad Mar 24 '20 at 03:52
  • try removing kafka-clients from the pom, this is not necessary. – Gokulraj Mar 24 '20 at 04:44
  • Not: DirectStream is deprecated as of Spark 2.4. Use `spark-sql-kafka` – OneCricketeer Mar 24 '20 at 23:12

1 Answers1

1

Try removing "kafka-clients" as this is not necessary.

Also if you are using apache spark , make sure to use the below one.

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
    <version>2.3.0</version>
</dependency>
Gokulraj
  • 450
  • 1
  • 3
  • 20
  • Thanks i added this but this also has Kafka 0.10 . My intent is to use kafka 0.11 as it has the feature which we need to use. – Ajith Kannan Mar 24 '20 at 18:26
  • 1
    @AjithKannan The Spark documentation explicitly says not to add kafka-clients yourself. The latest `kafka-sql-0.10` uses `2.x` clients anyway. 0.10 just means 0.10 **and higher** – OneCricketeer Mar 24 '20 at 23:11