Hadoop NullWritable for RCFIle Format Files

Question

I couldn't quite understand the concept of NullWritable in Hadoop. What is it used for and why is the outputKeyClass for an RCFile format a NullWritable.class and outputValueClass a BytesRefArrayWritable.class?

score 2 · Answer 1 · answered Apr 30 '14 at 22:06

This is because there is no key for a RCFiles. When you read from a plain text file such as a CSV, the key is usually a byte offset into the file. Since RCFiles have a columnar storage format, there's not really a key that can be used to identify a row as rows are fragmented across different column sets. NullWritable essentially means ignore this value.

score 1 · Answer 2 · answered May 01 '14 at 15:58

"The Map and Reduce functions of MapReduce are both defined with respect to data structured in (key, value) pairs" (wikipedia).

It means that your data must be structured in (key, value) pairs. But sometimes there is no need to use a key, and you cannot set it to Null because a key must implement WritableComparable. That's why Hadoop created a NullWritable class.

Hadoop NullWritable for RCFIle Format Files

2 Answers2