Why can't we use Java primitive data types in Map Reduce?

Question

I am learning Hadoop MapReduce framework . I am struggling to find that Why can't we use Java primitive data types in Map Reduce.

Most likely because of the way that data is passed around. In many places you need Objects (and would need special handling for primitives). But: does it matter? — Thilo, Nov 24 '15 at 11:23

Ravindra babu · Answer 1 · 2015-11-24T12:26:02.373

In Hadoop, interprocess communication was built with remote procedure calls ( RPC). The RPC protocol uses serialization to render the message into a binary stream at sender and it will be deserialized into the original message from binary stream at receiver.

For effectiveness of Hadoop, the serialization/de-serialization process should be optimized because huge number of remote calls happen between the nodes in the cluster. So the serialization format should be fast, compact, extensible and interoperable. Due to this reason, Hadoop framework has come up with one IO classes to replace java primitive data types. e.g. IntWritbale for int, LongWritable for long, Text for String etc.

Refer to Hadoop the definitive guide 4th edition for more details.

From Apache website, Purpose of Writable is quoted as :

A serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput.

Thanks for the response it is really helpful. – rraghuva Nov 25 '15 at 09:05 — rraghuva, Nov 25 '15 at 09:05

score 3 · Accepted Answer · answered Nov 24 '15 at 12:52

The Java serialization requires the hash of the class to be prefixed before each instance of the object in the serialized format. Hence, to read the object, you do not need to specify the class name. This causes an overhead to read the object since each object can be an instance of different classes.

In Hadoop Serialization, we specify the class name while retrieving it. Hence, there is no need for a prefix since we already have knowledge of what we are retrieving. Hence we set the InputFormat. This increases the speed and performance in various aspect during RPC's.

Thanks for the response it is really helpful. – rraghuva Nov 25 '15 at 09:05 — rraghuva, Nov 25 '15 at 09:05

Why can't we use Java primitive data types in Map Reduce?

2 Answers2