3

I'm confused about how spark interact with HBase in terms of data format. For instance, when I omitted the 'ERROR' line in the following code snippet, it runs well... but adding the line, I've caught the error saying 'Task not serializable' related to serialization issue.

How do I change the code? What is the reason why the error happens?

My code is following :

// HBase 
    Configuration hconfig = HBaseConfiguration.create();
    hconfig.set("hbase.zookeeper.property.clientPort", "2222");
    hconfig.set("hbase.zookeeper.quorum", "127.0.0.1");  
    hconn = HConnectionManager.createConnection(hconfig);  
    HTable htable = new HTable(hconf, Bytes.toBytes(tableName));       

// KAFKA configuration 
    Set<String> topics = Collections.singleton(topic); 

    Map<String, String> kafkaParams = new HashMap<>();
    kafkaParams.put("metadata.broker.list", "localhost:9092");
    kafkaParams.put("zookeeper.connect", "localhost:2222");
    kafkaParams.put("group.id", "tag_topic_id");  

//Spark Stream  
    JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(
                ssc, String.class, String.class, StringDecoder.class, StringDecoder.class, kafkaParams, topics );  

    JavaDStream<String> lines = messages.map(new Function<Tuple2<String, String>, String>() {   

        @Override
        public String call(Tuple2<String, String> tuple2)  { 
            return tuple2._2();
        }
    });  

    JavaDStream<String> records = lines.flatMap(new FlatMapFunction<String, String>() {  

        @Override
        public Iterator<String> call(String x) throws IOException {   

////////////// Put into HBase : ERROR ///////////////////// 
            String[] data = x.split(","); 

            if (null != data && data.length > 2 ){ 
                SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss");   
                String ts = sdf.format(new Date());

                Put put = new Put(Bytes.toBytes(ts)); 

                put.addImmutable(Bytes.toBytes(familyName), Bytes.toBytes("LINEID"), Bytes.toBytes(data[0]));
                put.addImmutable(Bytes.toBytes(familyName), Bytes.toBytes("TAGID"), Bytes.toBytes(data[1]));
                put.addImmutable(Bytes.toBytes(familyName), Bytes.toBytes("VAL"), Bytes.toBytes(data[2]));

                htable.put(put); // ***** ERROR ******** 
                htable.close();  
            }
            return Arrays.asList(COLDELIM.split(x)).iterator(); 
        } 

    }); 

    records.print();

    ssc.start();
    ssc.awaitTermination();

When I start my application, I met the following error:

Exception in thread "main" org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2037)
at org.apache.spark.streaming.dstream.DStream$$anonfun$flatMap$1.apply(DStream.scala:554)
at org.apache.spark.streaming.dstream.DStream$$anonfun$flatMap$1.apply(DStream.scala:554)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:682)
at org.apache.spark.streaming.StreamingContext.withScope(StreamingContext.scala:264)
at org.apache.spark.streaming.dstream.DStream.flatMap(DStream.scala:553)
at org.apache.spark.streaming.api.java.JavaDStreamLike$class.flatMap(JavaDStreamLike.scala:172)
at org.apache.spark.streaming.api.java.AbstractJavaDStreamLike.flatMap(JavaDStreamLike.scala:42) 

Caused by: java.io.NotSerializableException: org.apache.hadoop.hbase.client.HTable
Serialization stack:
- object not serializable (class: org.apache.hadoop.hbase.client.HTable, value: MCSENSOR;hconnection-0x6839203b)
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
Chris Joo
  • 577
  • 10
  • 24
  • you have a clue in your stack trace it self(i.e `object not serializable (class: org.apache.hadoop.hbase.client.HTable,`), the way you are creating HTable instance for puts could be changed. – Ram Ghadiyaram Jan 20 '17 at 06:56
  • Thanks for your reply.. I've already instancing the HTable like HTable htable = new HTable(hconfig, Bytes.byte(tableName)); But, It stills occurs same error. – Chris Joo Jan 20 '17 at 07:19

1 Answers1

1

You have a hint here by serialization debugger

Caused by: java.io.NotSerializableException: org.apache.hadoop.hbase.client.HTable
Serialization stack:
- object not serializable (class: org.apache.hadoop.hbase.client.HTable, value: MCSENSOR;hconnection-0x6839203b)

put the below part inside FlatMapFunction before call method (closure) where you are using it, that should solve the issue

Configuration hconfig = HBaseConfiguration.create();
    hconfig.set("hbase.zookeeper.property.clientPort", "2222");
    hconfig.set("hbase.zookeeper.quorum", "127.0.0.1");  
    hconn = HConnectionManager.createConnection(hconfig);  
    HTable htable = new HTable(hconf, Bytes.toBytes(tableName));  
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • Wow...Thanks sooooooooo much!! The issue noted above has been resolved with your tip... but another error occurred above that: – Chris Joo Jan 20 '17 at 11:37
  • Thank you... I've posted another question.. I look forward to answering your question. – Chris Joo Jan 20 '17 at 13:30