1

I am currently trying to deserialize a custom object where one of the fields is a MapWritable and the other one is a String. It seems serialization works ok, but can't verify that the object is being recreated properly. He are my fields and the write() readFields() methods:

public class ExchangeDataSample implements DataSample {

    private String labelColumn;

    private MapWritable values = new MapWritable();

    ...other methods...

    @Override
    public void readFields(DataInput in) throws IOException {
        values.clear();
        values.readFields(in);
        labelColumn = in.readLine();
    }

    @Override
    public void write(DataOutput out) throws IOException {
        values.write(out);
        out.writeBytes(labelColumn);
    }
}

I keep getting this exception in my MapReduce Job:

java.lang.Exception: java.io.EOFException
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:197)
    at java.io.DataInputStream.readUTF(DataInputStream.java:609)
    at java.io.DataInputStream.readUTF(DataInputStream.java:564)
    at org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:207)
    at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:167)
    at decisiontree.data.ExchangeDataSample.readFields(ExchangeDataSample.java:98)
    at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:96)
    at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
    at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
    at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:146)
    at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
    at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
    at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1688)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1637)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1489)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

I appreciate the help greatly. Thanks.

Alex Ramos
  • 135
  • 2
  • 14

1 Answers1

1

You are getting this exception because you read without checking the end of file. Try changing the readFields method to this:

 @Override
public void readFields(DataInput in) throws IOException {
    values.clear();
    byte[] b = new byte[1024];
    //checks for end of file
    if(((DataInputStream) in).read(b)!=-1){ 
      values.readFields(in);
      labelColumn = in.readLine();
    }
}
Aurasphere
  • 3,841
  • 12
  • 44
  • 71
  • thanks for the help, but I'm still getting the same error at "value.readFields(in)." Is this because it's already being read in the if statement? Then it's -1 again. Should I just wrap in a try/catch? Also, does the MapWritable value not need to be explicitly re-set? I'm not very familiar with serializing/deserializing complex structures. – Alex Ramos Nov 10 '15 at 14:49
  • I wrapped in a try/catch but simply trying to print the string "labelColumn" after in.readLine, shows a bunch of garbled data that should be what's in the MapWritable object. If I can't get this working, I may have to try Java Json or something. – Alex Ramos Nov 10 '15 at 15:01
  • @AlexRamos Yeah that should be the content withouth deserialization. There's something odd here but If you want a quick solution I would suggest you to switch to JSON which is indeed easier. Take a look at this library: https://code.google.com/p/json-io/ – Aurasphere Nov 10 '15 at 15:13
  • I ended up using jackson JSON libraries for this. I upvoted your answer since your comment was what led me to it. Thanks! – Alex Ramos May 09 '16 at 16:29
  • Happy to hear that! ;) – Aurasphere May 09 '16 at 16:34
  • Also, if you don't feel like accepting my answer and you solved your problem, you should write an answer yourself and accept that instead so this question is closed. Thank you! – Aurasphere May 09 '16 at 16:52