How to get use data that coming from Kafka using Spark

Question

I wrote the below code to consume data using Spark Job,

Is there anything missing for streaming kafka or processing data after got retrieve? How can I test Data is retrieved or not?

// StreamingExamples.setStreamingLogLevels();
        SparkConf sparkConf = new SparkConf().setAppName("JavaKafkaWordCount").setMaster("local[*]");
        ;
        // Create the context with 2 seconds batch size
        JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(1000));

        Map<String, Integer> topicMap = new HashMap<>();
        topicMap.put("Ptopic", 1);

        JavaPairReceiverInputDStream<String, String> messages = KafkaUtils.createStream(jssc, "localhost:2181", "5",
                topicMap);

        /*messages.foreach(new Function<JavaRDD<String, String>, Void>() {
            public Void call(JavaRDD<String, String> accessLogs) {
                  return null;
                }}
              );*/

        JavaDStream<String> lines = messages.map(new Function<Tuple2<String, String>, String>() {
            @Override
            public String call(Tuple2<String, String> tuple2) {
                /*System.out.println(tuple2._1().toString());
                System.out.println(tuple2._2().toString());*/
                return tuple2._2();
            }
        });

        lines.print();
        jssc.start();
        jssc.awaitTermination();

Here result is just printing..

Did you check this: https://spark.apache.org/docs/1.6.1/streaming-kafka-integration.html ? And maybe its a good idea first to create a kafka producer and consumer via terminal. if it works, you should start try to integrate spark. — lidox, Nov 15 '16 at 06:57
Yes. I tried it.. In this code I am able to get results in "lines" variable but now How to get retrieve string using forEachRDD? so that I can perform some operation on this string — Vimal Dhaduk, Nov 15 '16 at 06:59
you could sink your RDD using a database in behind. like this picture illustrates: http://spark.apache.org/docs/latest/img/streaming-arch.png after that you can load the data and make some operations — lidox, Nov 15 '16 at 07:05
I have two type of requirement sometimes need to sync with database and sometimes need to store into another kafka queue — Vimal Dhaduk, Nov 15 '16 at 07:06

lidox · Answer 1 · 2016-11-15T10:10:58.270

1

You could use the basic big data function like map, reduce, flatmap and so on.

Update 1:

    JavaDStream<String> lines = messages.map(new Function<Tuple2<String, String>, String>() {
        @Override
        public String call(Tuple2<String, String> tuple2) {
            /*System.out.println(tuple2._1().toString());
            System.out.println(tuple2._2().toString());*/
            return tuple2._2();
        }
    });

    // TODO: make some transformation here:
    lines = lines.map(x -> { // clean data
        String callType = x.getCallType().replaceAll("\"", "").replaceAll("[-|,]", ""); // here some operations
        x.setCallType(callType);
        return x;
    }).filter(pair -> { // filter data
        return !isFilteredOnFire || pair.getCallType().matches("(?i).*\\bFire\\b.*"); // here so filters
    });

    lines.print();
    jssc.start();
    jssc.awaitTermination();

The complete example is described in this blog post

edited Nov 15 '16 at 10:10

answered Nov 15 '16 at 07:13

lidox

1,901
3
21
40

Code is available in questions, for "lines" variable how it would work by this code and from where I can send data to the kafka? – Vimal Dhaduk Nov 15 '16 at 07:29
maybe check this: http://stackoverflow.com/questions/31590592/how-to-write-to-kafka-from-spark-streaming – lidox Nov 15 '16 at 10:14
basically you want a method to sink spark stream to kafka, right? – lidox Nov 15 '16 at 10:20

How to get use data that coming from Kafka using Spark

1 Answers1

Linked