0

I have a use case, where I need to read messages from kafka and for each message, extract data and invoke elasticsearch Index. The response will be further used to do further processing. I am getting below error when invoking JavaEsSpark.esJsonRDD

java.lang.ClassCastException: org.elasticsearch.spark.rdd.EsPartition incompatible with org.apache.spark.rdd.ParallelCollectionPartition at org.apache.spark.rdd.ParallelCollectionRDD.compute(ParallelCollectionRDD.scala:102)

My code snippet is below

              public static void main(String[] args) {
                if (args.length < 4) {
                    System.err.println("Usage: JavaKafkaIntegration <zkQuorum> <group> <topics> <numThreads>");
                    System.exit(1);
                  }

                SparkConf sparkConf = new SparkConf().setAppName("JavaKafkaIntegration").setMaster("local[2]").set("spark.driver.allowMultipleContexts", "true");
                //Setting when using JavaEsSpark.esJsonRDD
            sparkConf.set("es.nodes",<NODE URL>);
                sparkConf.set("es.nodes.wan.only","true");
                context = new JavaSparkContext(sparkConf);


                // Create the context with 2 seconds batch size
                JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(2000));

                int numThreads = Integer.parseInt(args[3]);
                Map<String, Integer> topicMap = new HashMap<>();
                String[] topics = args[2].split(",");
                for (String topic: topics) {
                  topicMap.put(topic, numThreads);
                }

                //Receive Message From kafka
                JavaPairReceiverInputDStream<String, String> messages =
                        KafkaUtils.createStream(jssc,args[0], args[1], topicMap);

                JavaDStream<String> jsons = messages
                        .map(new Function<Tuple2<String, String>, String>() {
                            /**
                             * 
                             */
                            private static final long serialVersionUID = 1L;

                            @Override
                            public String call(Tuple2<String, String> tuple2){

                                JavaRDD<String> esRDD =  JavaEsSpark.esJsonRDD(context, <index>,<search string>  ).values() ; 

                                 return null;

                            }


                        });             

                  jsons.print();
                  jssc.start();
                  jssc.awaitTermination();         


       }    

I am getting error when invoking JavaEsSpark.esJsonRDD. Is it correct way to do it? How do I successfully invoke ES from spark? I am running kafka and spark on windows and invoking external elastic search index.

ash200
  • 25
  • 5

0 Answers0