-1

My spark version is 1.6.2, And My kafka version is 0.10.1.0. And I want to send a custom object as the kafka value type and I try to push this custom object into the kafka topic. And use spark streaming to read the data. And I'm using Direct approach. The following is my code:

import com.xxxxx.kafka.{KafkaJsonDeserializer, KafkaObjectDecoder, pharmacyData}
import kafka.serializer.StringDecoder
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.streaming.{Seconds, StreamingContext}

object sparkReadKafka {
  val sparkConf = new SparkConf().setAppName("SparkReadKafka")
  val sc = new SparkContext(sparkConf)
  val ssc = new StreamingContext(sc, Seconds(1))

  def main(args: Array[String]): Unit = {
    val kafkaParams = Map[String, Object] (
      "bootstrap.servers" -> "kafka.kafka-cluster-shared.non-prod-5-az-scus.prod.us.xxxxx.net:9092",
      //"key.deserializer" -> classOf[StringDeserializer],
      //"value.deserializer" -> classOf[KafkaJsonDeserializer],
      "group.id" -> "consumer-group-2",
      "auto.offset.reset" -> "earliest",
      "auto.commit.interval.ms" -> "1000",
      "enable.auto.commit" -> (false: java.lang.Boolean),
      "session.timeout.ms" -> "30000"
    )

    val topic = "hw_insights"

    val stream = KafkaUtils.createDirectStream[String, pharmacyData, StringDecoder, KafkaObjectDecoder](ssc, kafkaParams, Set(topic))
  }
}

And the error I got is similar to this(I have to remove some part for security purpose):

Error:(29, 47) overloaded method value createDirectStream with alternatives: (jssc: org.apache.spark.streaming.api.java.JavaStreamingContext,keyClass: Class[String],valueClass: Class[com.xxxxxxx.kafka.pharmacyData],keyDecoderClass: Class[kafka.serializer.StringDecoder],valueDecoderClass: Class[com.xxxxxxx.kafka.KafkaObjectDecoder],kafkaParams: java.util.Map[String,String],topics: java.util.Set[String])org.apache.spark.streaming.api.java.JavaPairInputDStream[String,com.xxxxxxx.kafka.pharmacyData] (ssc: org.apache.spark.streaming.StreamingContext,kafkaParams: scala.collection.immutable.Map[String,String],topics: scala.collection.immutable.Set[String])(implicit evidence$19: scala.reflect.ClassTag[String], implicit evidence$20: scala.reflect.ClassTag[com.xxxxxxx.kafka.pharmacyData], implicit evidence$21: scala.reflect.ClassTag[kafka.serializer.StringDecoder], implicit evidence$22: scala.reflect.ClassTag[com.xxxxxxx.kafka.KafkaObjectDecoder])org.apache.spark.streaming.dstream.InputDStream[(String, com.xxxxxxx.kafka.pharmacyData)] cannot be applied to (org.apache.spark.streaming.StreamingContext, scala.collection.immutable.Map[String,Object], scala.collection.immutable.Set[String]) val stream = KafkaUtils.createDirectStream[String, pharmacyData, StringDecoder, KafkaObjectDecoder](ssc, kafkaParams, Set(topic)) And below is my customer decoder class:

import kafka.serializer.Decoder
import org.codehaus.jackson.map.ObjectMapper

class KafkaObjectDecoder extends Decoder[pharmacyData] {
  override def fromBytes(bytes: Array[Byte]): pharmacyData = {
    val mapper = new ObjectMapper()
    val pdata = mapper.readValue(bytes, classOf[pharmacyData])
    pdata
  }
}

Can someone please help me with issues? Thannk you!

ZHEN BIAN
  • 13
  • 1
  • 6
  • 1) please show the full error. Looks like a compile issue 2) Kafka already comes with json deserializer. 3) But also you should upgrade Spark – OneCricketeer Feb 19 '20 at 05:41
  • Hi, I just add the full error. Can you tell me how to add the kafka json deserializer. It would be better if you have example. And upgrade spark is not under my control. Thanks! – ZHEN BIAN Feb 19 '20 at 05:53
  • Why is it not in your control? You can update your maven dependencies and upload your own Spark distribution tarballs to HDFS containing newer versions – OneCricketeer Feb 19 '20 at 05:55
  • And did you see these? https://github.com/apache/kafka/tree/1.0/connect/json/src/main/java/org/apache/kafka/connect/json – OneCricketeer Feb 19 '20 at 05:57
  • I'll need to submit the project to spark cluster. And the spark cluster runs on spark 1.6.2. – ZHEN BIAN Feb 19 '20 at 06:16
  • Your Hadoop cluster runs YARN, not a version of Spark – OneCricketeer Feb 19 '20 at 06:17
  • Also relevant https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.5/bk_spark-component-guide/content/spark-choose-version.html – OneCricketeer Feb 19 '20 at 06:19
  • Yes we are using HDP, however the spark2 is not properly installed. So I can only use spark 1.6.2 – ZHEN BIAN Feb 19 '20 at 16:13
  • Well, sounds like an administrative problem and you should install it from Ambari anyway so it isn't incorrectly done – OneCricketeer Feb 19 '20 at 16:15

1 Answers1

0

The error is saying your parameters are incorrect

cannot be applied to (org.apache.spark.streaming.StreamingContext, scala.collection.immutable.Map[String,Object], scala.collection.immutable.Set[String])

The closest method it thinks you want is

(jssc: org.apache.spark.streaming.api.java.JavaStreamingContext,keyClass: Class[String],valueClass: Class[com.xxxxxxx.kafka.pharmacyData],keyDecoderClass: Class[kafka.serializer.StringDecoder],valueDecoderClass: Class[com.xxxxxxx.kafka.KafkaObjectDecoder],kafkaParams: java.util.Map[String,String],topics: java.util.Set[String])

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Sorry, I don't understand, which part I'm doing is not correct. Can you be more specific. I'm really a novice for spark streaming and kafka. Thank you very much! – ZHEN BIAN Feb 19 '20 at 06:22
  • This is a Scala problem, not Kafka or Streaming. Where did you copy this code from? Are you using an IDE to write it? The error tells you that the problem exists on line 29 – OneCricketeer Feb 19 '20 at 12:55
  • Yes, I'm using IntelliJ IDEA – ZHEN BIAN Feb 19 '20 at 16:07
  • And there are no errors shown on the createDirectStream method? – OneCricketeer Feb 19 '20 at 16:08
  • No, error is on line 29, which is exactly the line I use createDirectStream method. Therefore, I think the error is because I didn't use this method properly. – ZHEN BIAN Feb 19 '20 at 16:37