I couldn't quite understand the concept of NullWritable in Hadoop. What is it used for and why is the outputKeyClass for an RCFile
format a NullWritable.class
and outputValueClass a BytesRefArrayWritable.class
?
Asked
Active
Viewed 202 times
0

Pratik Khadloya
- 12,509
- 11
- 81
- 106
2 Answers
2
This is because there is no key for a RCFiles. When you read from a plain text file such as a CSV, the key is usually a byte offset into the file. Since RCFiles have a columnar storage format, there's not really a key that can be used to identify a row as rows are fragmented across different column sets. NullWritable
essentially means ignore this value
.

Mike Park
- 10,845
- 2
- 34
- 50
1
"The Map and Reduce functions of MapReduce are both defined with respect to data structured in (key, value) pairs" (wikipedia).
It means that your data must be structured in (key, value) pairs. But sometimes there is no need to use a key, and you cannot set it to Null
because a key must implement WritableComparable
. That's why Hadoop created a NullWritable
class.

Mouna
- 3,221
- 3
- 27
- 38