I was working with ArrayWritable
, at some point I needed to check how Hadoop serializes the ArrayWritable
, this is what I got by setting job.setNumReduceTasks(0)
:
0 IntArrayWritable@10f11b8
3 IntArrayWritable@544ec1
6 IntArrayWritable@fe748f
8 IntArrayWritable@1968e23
11 IntArrayWritable@14da8f4
14 IntArrayWritable@18f6235
and this is the test mapper that I was using:
public static class MyMapper extends Mapper<LongWritable, Text, LongWritable, IntArrayWritable> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
int red = Integer.parseInt(value.toString());
IntWritable[] a = new IntWritable[100];
for (int i =0;i<a.length;i++){
a[i] = new IntWritable(red+i);
}
IntArrayWritable aw = new IntArrayWritable();
aw.set(a);
context.write(key, aw);
}
}
IntArrayWritable
is taken from the example given in the javadoc: ArrayWritable.
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.IntWritable;
public class IntArrayWritable extends ArrayWritable {
public IntArrayWritable() {
super(IntWritable.class);
}
}
I actually checked on the source code of Hadoop and this makes no sense to me.
ArrayWritable
should not serialize the class name and there is no way that an array of 100 IntWritable
can be serialized using 6/7 hexadecimal values. The application actually seems to work just fine and the reducer deserializes the right values...
What is happening? What am I missing?