What are the advantages of using NullWritable
for null
keys/values over using null
texts (i.e. new Text(null)
). I see the following from the «Hadoop: The Definitive Guide» book.
NullWritable
is a special type ofWritable
, as it has a zero-length serialization. No bytes are written to, or read from, the stream. It is used as a placeholder; for example, in MapReduce, a key or a value can be declared as aNullWritable
when you don’t need to use that position—it effectively stores a constant empty value. NullWritable can also be useful as a key inSequenceFile
when you want to store a list of values, as opposed to key-value pairs. It is an immutable singleton: the instance can be retrieved by callingNullWritable.get()
I do not clearly understand how the output is written out using NullWritable
? Will there be a single constant value in the beginning output file indicating that the keys or values of this file are null
, so that the MapReduce framework can ignore reading the null
keys/values (whichever is null
)? Also, how actually are null
texts serialized?
Thanks,
Venkat