Sequence file reading issue using spark Java

Question

i am trying to read the sequence file generated by hive using spark. When i try to access the file , i am facing org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException:

I have tried the workarounds for this issue like making the class serializable, still i face the issue. I am writing the code snippet here , please let me know what i am missing here.

Is it because of the BytesWritable data type or something else which is causing the issue.

JavaPairRDD<BytesWritable, Text> fileRDD = javaCtx.sequenceFile("hdfs://path_to_the_file", BytesWritable.class, Text.class);
List<String> result = fileRDD.map(new Function<Tuple2<BytesWritables,Text>,String>(){
public String call (Tuple2<BytesWritable,Text> row){
return row._2.toString()+"\n";

}).collect();
}

Please post the stack trace of error and it would be helpful if you could post the whole code. — code, May 04 '17 at 07:40

score 1 · Answer 1 · answered Sep 01 '17 at 07:46

Here is what was needed to make it work

Because we use HBase to store our data and this reducer outputs its result to HBase table, Hadoop is telling us that he doesn’t know how to serialize our data. That is why we need to help it. Inside setUp set the io.serializations variable You can do it in spark accordingly

conf.setStrings("io.serializations", new String[]{hbaseConf.get("io.serializations"), MutationSerialization.class.getName(), ResultSerialization.class.getName()});

Sequence file reading issue using spark Java

1 Answers1