3

I am writing a code in which I am trying to consume messages using kafka and spark. But my code doesn't work. Here is my code:

import org.apache.kafka.clients.producer.{ KafkaProducer, ProducerConfig, ProducerRecord }
import org.apache.spark.streaming._
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SparkSession
import java.util._

object Smack_Kafka_Spark extends App {
def main(args: Array[String]) {
val kafkaBrokers = "localhost:2181"

val kafkaOpTopic = "test"
/*val props = new HashMap[String, Object]()
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaBrokers)
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
  "org.apache.kafka.common.serialization.StringSerializer")
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
  "org.apache.kafka.common.serialization.StringSerializer")*/

val props = new Properties()
props.put("bootstrap.servers", "localhost:2181")

props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")

val producer = new KafkaProducer[String, String](props)

var spark: SparkSession = null
val textFile: RDD[String] = spark.sparkContext.textFile("dataset.txt")
textFile.foreach(record => {
  val data = record.toString
  val message = new ProducerRecord[String, String](kafkaOpTopic, null, data)
  producer.send(message)
})
 producer.close()
 }
}

This is the error I got:

log4j:WARN No appenders could be found for logger (org.apache.kafka.clients.producer.ProducerConfig).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.lang.NullPointerException
at Smack_Kafka_Spark$.main(Smack_Kafka_Spark.scala:25)
at Smack_Kafka_Spark.main(Smack_Kafka_Spark.scala)

I will be very grateful for any help!

abaghel
  • 14,783
  • 2
  • 50
  • 66
edkeveked
  • 17,989
  • 10
  • 55
  • 93

1 Answers1

2

You are getting NullPointerException because SparkSession is null. Create it like below.

val spark : SparkSession = SparkSession.builder()
  .appName("Smack_Kafka_Spark")
  .master("local[*]")
  .getOrCreate()

Now read your text file like below.

val textFile: Dataset[String] = spark.read.textFile("dataset.txt")

Another issue you might face when you run your program is

Exception in thread "main" org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException: org.apache.kafka.clients.producer.KafkaProducer

The KafkaProducer is not serailizable. You would need to move your KafkaProducer instance creation inside foreachPartition. Please check SO post spark kafka producer serializable

Community
  • 1
  • 1
abaghel
  • 14,783
  • 2
  • 50
  • 66
  • Thank you! As you said, the issue of serialization came up. And then I followed the link you had mentioned. – edkeveked Jan 15 '17 at 06:45
  • This is the code corrected: This is the code corrected val textFile: RDD[String] = spark.sparkContext.textFile("Dataset.txt") textFile.foreachPartition((partisions: Iterator[String]) => { val producer: KafkaProducer[String, String] = new KafkaProducer[String, String](props) partisions.foreach((line: String) => { try { producer.send(new ProducerRecord[String, String]("test", line)) } catch { case ex: Exception => { } } })}) – edkeveked Jan 15 '17 at 06:51
  • Could you please tell me how I can add a consumer in this code to retrieve what has been sent by the producer? – edkeveked Jan 15 '17 at 07:24
  • Your Kafka consumer should be different Spark Application. You would need to create StreamingContext and use methods from KafkaUtils classe. Please check sample at https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/KafkaWordCount.scala. You can google and find some working examples as well. – abaghel Jan 15 '17 at 07:59