Hi i am trying to integrate Kafka with Spark streaming.
I want to find count of messages foreachRDD in JavaDStream.
Please find the below code and give me some suggestions.
public class App {
@SuppressWarnings("serial")
public static void main(String[] args) throws Exception{
SparkConf conf = new SparkConf()
.setAppName("Streamingkafka")
.setMaster("local[*]");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaStreamingContext ssc = new JavaStreamingContext(sc, new Duration(1000));
Map<String, String> kafkaParams = new HashMap<String, String>();
kafkaParams.put("metadata.broker.list", "localhost:9092");
Set<String> topics = Collections.singleton("data_one");
JavaPairInputDStream<String,String> directKafkaStream = KafkaUtils.createDirectStream(ssc,String.class, String.class, StringDecoder.class, StringDecoder.class, kafkaParams, topics);
JavaDStream<String> msgDataStream = directKafkaStream.map(new Function<Tuple2<String, String>, String>() {
@Override
public String call(Tuple2<String, String> tuple2) {
return tuple2._2();
}
});
msgDataStream.print();
msgDataStream.count();
ssc.start();
ssc.awaitTermination();
}
}
Thanks in advance.