0

I want to read the binlog data from kafka using Spark Streaming, the binlog data is collected using canal(which uses protobuf-2.4.1),I have to use the protobuf-2.5.0 in the Spark Streaming environment. Now i got the following Exception

16/07/11 15:13:01 ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 : java.lang.RuntimeException: Unable to find proto buffer class at com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:775) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1104) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1807) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at com.data.binlog.BinlogEntryUtil.deserializeFromProtoBuf(BinlogEntryUtil.java:30) at main.com.data.scala.Utils$.binlogDecode(Utils.scala:30) at main.com.data.scala.IntegrateKafka$$anonfun$main$4.apply(IntegrateKafka.scala:37) at main.com.data.scala.IntegrateKafka$$anonfun$main$4.apply(IntegrateKafka.scala:37) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at main.com.data.scala.IntegrateKafka$$anonfun$main$5$$anonfun$apply$2$$anonfun$apply$3.apply(IntegrateKafka.scala:42) at main.com.data.scala.IntegrateKafka$$anonfun$main$5$$anonfun$apply$2$$anonfun$apply$3.apply(IntegrateKafka.scala:42) at org.apache.spark.Logging$class.logInfo(Logging.scala:59) at main.com.data.scala.IntegrateKafka$.logInfo(IntegrateKafka.scala:16) at main.com.data.scala.IntegrateKafka$$anonfun$main$5$$anonfun$apply$2.apply(IntegrateKafka.scala:42) at main.com.data.scala.IntegrateKafka$$anonfun$main$5$$anonfun$apply$2.apply(IntegrateKafka.scala:39) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:898) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:898) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: com.alibaba.otter.canal.protocol.CanalEntry$Entry at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:191) at com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:768) ... 29 more

the code com.data.binlog.BinlogEntryUtil.deserializeFromProtoBuf is here

public static Entry deserializeFromProtoBuf(byte[] input) {
    Entry entry = null;

    try {
       ObjectInputStream ois = new ObjectInputStream(new   ByteArrayInputStream(input));

       entry = (Entry)ois.readObject();
    } catch (ClassNotFoundException e) {
       logger.error("Exception:" + e);
    } catch (IOException e) {
       logger.error("IOException " + e);
    }

    return entry;
}

But I find CanalEntry$Entry.class in my jar

-rw---- 2.0 fat 14431 bl defN 16-Jul-11 15:11 com/alibaba/otter/canal/protocol/CanalEntry$Entry.class

I have tried to generate the CanalEntry.java and CanalPacket.java using protoc-2.5.0, but gotten the same exception: java.lang.ClassNotFoundException: com.alibaba.otter.canal.protocol.CanalEntry$Entry

Can anybody give me some suggestions to read the binlog data (serialized by protobuf-2.4.1) using protobuf-2.5.0? thanks

Dmytro Maslenko
  • 2,247
  • 9
  • 16
klion26
  • 71
  • 1
  • 5

1 Answers1

0

After some more tries, I find this is not protobuf's problem.
I get this problem, because the binlog datas aren't the pure protobuf bytes, they are serialized bytes of some other class which include the protobuf bytes.

klion26
  • 71
  • 1
  • 5