Deserialize MapWritable within custom java class

Question

I am currently trying to deserialize a custom object where one of the fields is a MapWritable and the other one is a String. It seems serialization works ok, but can't verify that the object is being recreated properly. He are my fields and the write() readFields() methods:

public class ExchangeDataSample implements DataSample {

    private String labelColumn;

    private MapWritable values = new MapWritable();

    ...other methods...

    @Override
    public void readFields(DataInput in) throws IOException {
        values.clear();
        values.readFields(in);
        labelColumn = in.readLine();
    }

    @Override
    public void write(DataOutput out) throws IOException {
        values.write(out);
        out.writeBytes(labelColumn);
    }
}

I keep getting this exception in my MapReduce Job:

java.lang.Exception: java.io.EOFException
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:197)
    at java.io.DataInputStream.readUTF(DataInputStream.java:609)
    at java.io.DataInputStream.readUTF(DataInputStream.java:564)
    at org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:207)
    at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:167)
    at decisiontree.data.ExchangeDataSample.readFields(ExchangeDataSample.java:98)
    at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:96)
    at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
    at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
    at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:146)
    at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
    at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
    at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1688)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1637)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1489)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

I appreciate the help greatly. Thanks.

score 1 · Answer 1 · answered Nov 10 '15 at 14:04

1

You are getting this exception because you read without checking the end of file. Try changing the readFields method to this:

 @Override
public void readFields(DataInput in) throws IOException {
    values.clear();
    byte[] b = new byte[1024];
    //checks for end of file
    if(((DataInputStream) in).read(b)!=-1){ 
      values.readFields(in);
      labelColumn = in.readLine();
    }
}

answered Nov 10 '15 at 14:04

Aurasphere

3,841
12
44
71

thanks for the help, but I'm still getting the same error at "value.readFields(in)." Is this because it's already being read in the if statement? Then it's -1 again. Should I just wrap in a try/catch? Also, does the MapWritable value not need to be explicitly re-set? I'm not very familiar with serializing/deserializing complex structures. – Alex Ramos Nov 10 '15 at 14:49
I wrapped in a try/catch but simply trying to print the string "labelColumn" after in.readLine, shows a bunch of garbled data that should be what's in the MapWritable object. If I can't get this working, I may have to try Java Json or something. – Alex Ramos Nov 10 '15 at 15:01
@AlexRamos Yeah that should be the content withouth deserialization. There's something odd here but If you want a quick solution I would suggest you to switch to JSON which is indeed easier. Take a look at this library: https://code.google.com/p/json-io/ – Aurasphere Nov 10 '15 at 15:13
I ended up using jackson JSON libraries for this. I upvoted your answer since your comment was what led me to it. Thanks! – Alex Ramos May 09 '16 at 16:29
Happy to hear that! ;) – Aurasphere May 09 '16 at 16:34
Also, if you don't feel like accepting my answer and you solved your problem, you should write an answer yourself and accept that instead so this question is closed. Thank you! – Aurasphere May 09 '16 at 16:52

Deserialize MapWritable within custom java class

1 Answers1