Could anyone please explain me that:
What is Writable and Writable Comparable interface in Hadoop?
What is different between these two?
Please explain with example.
Thanks in Advance,
Could anyone please explain me that:
What is Writable and Writable Comparable interface in Hadoop?
What is different between these two?
Please explain with example.
Thanks in Advance,
Writable in an interface in Hadoop and types in Hadoop must implement this interface. Hadoop provides these writable wrappers for almost all Java primitive types and some other types,but sometimes we need to pass custom objects and these custom objects should implement Hadoop's Writable interface.Hadoop MapReduce uses implementations of Writables for interacting with user-provided Mappers and Reducers.
To implement the Writable interface we require two methods:
public interface Writable {
void readFields(DataInput in);
void write(DataOutput out);
}
Why use Hadoop Writable(s)?
As we already know, data needs to be transmitted between different nodes in a distributed computing environment. This requires serialization and deserialization of data to convert the data that is in structured format to byte stream and vice-versa. Hadoop therefore uses simple and efficient serialization protocol to serialize data between map and reduce phase and these are called Writable(s). Some of the examples of writables as already mentioned before are IntWritable, LongWritable, BooleanWritable and FloatWritable.
Refer: https://developer.yahoo.com/hadoop/tutorial/module5.html for example
WritableComparable interface is just a subinterface of the Writable and java.lang.Comparable interfaces. For implementing a WritableComparable we must have compareTo method apart from readFields and write methods, as shown below:
public interface WritableComparable extends Writable, Comparable
{
void readFields(DataInput in);
void write(DataOutput out);
int compareTo(WritableComparable o)
}
Comparison of types is crucial for MapReduce, where there is a sorting phase during which keys are compared with one another.
Implementing a comparator for WritableComparables like the org.apache.hadoop.io.RawComparator interface will definitely help speed up your Map/Reduce (MR) Jobs. As you may recall, a MR Job is composed of receiving and sending key-value pairs. The process looks like the following.
(K1,V1) –> Map –> (K2,V2)
(K2,List[V2]) –> Reduce –> (K3,V3)
The key-value pairs (K2,V2) are called the intermediary key-value pairs. They are passed from the mapper to the reducer. Before these intermediary key-value pairs reach the reducer, a shuffle and sort step is performed.
The shuffle is the assignment of the intermediary keys (K2) to reducers and the sort is the sorting of these keys. In this blog, by implementing the RawComparator to compare the intermediary keys, this extra effort will greatly improve sorting. Sorting is improved because the RawComparator will compare the keys by byte. If we did not use RawComparator, the intermediary keys would have to be completely deserialized to perform a comparison.
Note (In Short):
1)WritableComparables can be compared to each other, typically via Comparators. Any type which is to be used as a key in the Hadoop Map-Reduce framework should implement this interface.
2) Any type which is to be used as a value in the Hadoop Map-Reduce framework should implement the Writable interface.
In short, the type used as key in Hadoop must be a WritableComparable
, while the type only used as value could be just a Writable
.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/WritableComparable.html
@InterfaceAudience.Public
@InterfaceStability.Stable
public interface WritableComparable<T>
extends Writable, Comparable<T>
A Writable which is also Comparable.
WritableComparables can be compared to each other, typically via Comparators. Any type which is to be used as a key in the Hadoop Map-Reduce framework should implement this interface.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/Writable.html
@InterfaceAudience.Public
@InterfaceStability.Stable
public interface Writable
A serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput.
Any key or value type in the Hadoop Map-Reduce framework implements this interface.
Writable is one interface you need to implement custom define class used in hadoop map-reduce. Two functions need to implement/override:
write() and readFields();
However, WritableComparable is another sub-interface of Writable and Comparable, three functions need to implement /override:
write() and readFields() | compareTo()
As we need implement compareTo(),
so class implement WritableComparable, can be used as both key or value in hadoop map-reduce.
However, class implement Writable, can only be used as value in hadoop map-reduce.
You can find example of this two interface in official website: https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/WritableComparable.html
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/Writable.html