It seems to me that a org.apache.hadoop.io.serializer.Serialization
could be written to serialize the java types directly in the same format that the wrapper classes serialize the type into. That way the Mappers and Reducers don't have to deal with the wrapper classes.
Asked
Active
Viewed 6,649 times
2

user533020
- 137
- 1
- 3
- 9
1 Answers
5
There is nothing stopping you changing the serialization to use a different mechanism such as java Serializable interface or something like thrift, protocol buffers etc.
In fact, Hadoop comes with an (experimental) Serialization implementation for Java Serializable objects - just configure the serialization factory to use it. The default serialization mechanism is WritableSerialization
, but this can be changed by setting the following configuration property:
io.serializations=org.apache.hadoop.io.serializer.JavaSerialization
Bear in mind however that anything that expects a Writable (Input/Output formats, partitioners, comparators) etc will need to be replaced by versions that can be passed a Serializable
instance rather than a Writable
instance.
Some more links for the curious reader:
- http://www.tom-e-white.com/2008/07/rpc-and-serialization-with-hadoop.html
- What are the connections and differences between Hadoop Writable and java.io.serialization? - Which seems to be a similar question to what you're asking, and Tariq has a good link to a thread in which Doug Cutting explains the rationale behind using Writables over Serializables

Community
- 1
- 1

Chris White
- 29,949
- 4
- 71
- 93
-
Yeah, I understand I can change the serialization implementation and that the Writable serialization format is superior. It still doesn't explain the requirement for the wrapper classes. Oh well, I think this is the best answer I'm going to get without directly asking one of the Hadoop creators. – user533020 Jun 20 '13 at 19:51
-
I guess i don't understand what you're asking for when you say 'requirement for the wrapper class' - Writable isn't a wrapper, its an interface denoting that the objects conform to some 'contract', and that the objects know how to serialize themselves – Chris White Jun 20 '13 at 19:58
-
What I'm saying that is each of the java types have Writable wrapper classes for them (int - IntWritable, string - TextWritable, etc). They decorate the Java type to say how to serialize them. When writing a mapper or reducer you have to pull out/unwrap/call the 'get' method to get the input java type and then also wrap the output Java type. All of that seems unnecessary. – user533020 Jun 20 '13 at 21:54
-
The WriteableSerialization could have been made so that a writable serializer is registered with it for each java type so that when it receives a type to serialize it calls the corresponding serializer. That way the serialization is completely transparent to the mappers & reducers and you still get the benefit of the leaner serialization that the Writables provide. This design pattern of registering implementations for each type was used for the RawComparator functionality the Writables provide, I just don't understand why they didn't do the same thing for the serialization functionality. – user533020 Jun 20 '13 at 21:55