17

I thought that they refer to the Reducer but in my program I have

public static class MyMapper extends Mapper< LongWritable, Text, Text, Text >

and

public static class MyReducer extends Reducer< Text, Text, NullWritable, Text >

so if I have

job.setOutputKeyClass( NullWritable.class );

job.setOutputValueClass( Text.class );

I get the following Exception

Type mismatch in key from map: expected org.apache.hadoop.io.NullWritable, recieved org.apache.hadoop.io.Text

but if I have

job.setOutputKeyClass( Text.class );

there is no problem.

Is there sth wrong with my code or this happens because of NullWritable or sth else?

Also do I have to use job.setInputFormatClass and job.setOutputFormatClass? Because my programs runs correctly without them.

Charles Menguy
  • 40,830
  • 17
  • 95
  • 117
nik686
  • 705
  • 3
  • 9
  • 17

1 Answers1

35

Calling job.setOutputKeyClass( NullWritable.class ); will set the types expected as output from both the map and reduce phases.

If your Mapper emits different types than the Reducer, you can set the types emitted by the mapper with the JobConf's setMapOutputKeyClass() and setMapOutputValueClass() methods. These implicitly set the input types expected by the Reducer.

(source: Yahoo Developer Tutorial)

Regarding your second question, the default InputFormat is the TextInputFormat. This treats each line of each input file as a separate record, and performs no parsing. You can call these methods if you need to process your input in a different format, here are some examples:

InputFormat             | Description                                      | Key                                      | Value
--------------------------------------------------------------------------------------------------------------------------------------------------------
TextInputFormat         | Default format; reads lines of text files        | The byte offset of the line              | The line contents
KeyValueInputFormat     | Parses lines into key, val pairs                 | Everything up to the first tab character | The remainder of the line
SequenceFileInputFormat | A Hadoop-specific high-performance binary format | user-defined                             | user-defined

The default instance of OutputFormat is TextOutputFormat, which writes (key, value) pairs on individual lines of a text file. Some examples below:

OutputFormat             | Description
---------------------------------------------------------------------------------------------------------
TextOutputFormat         | Default; writes lines in "key \t value" form
SequenceFileOutputFormat | Writes binary files suitable for reading into subsequent MapReduce jobs
NullOutputFormat         | Disregards its inputs

(source: Other Yahoo Developer Tutorial)

Charles Menguy
  • 40,830
  • 17
  • 95
  • 117
  • Oh you're right,I didn't know about the two methods that you mentioned.Adding them my program runs.Of course I use Job not JobConf but the methods exists as well.Thank you very much!Could you tell me sth about the last part of my question? – nik686 Jan 08 '13 at 22:50
  • @nik686 I added the answer to the last part of your question above. – Charles Menguy Jan 09 '13 at 00:49
  • So the default is TextInputFormat and TextOutputFormat that's why my program runs.Thank you very much! – nik686 Jan 09 '13 at 10:20
  • @CharlesMenguy. Am I understanding this right? **If your Mapper emits different types than the Reducer, you can set the types emitted by the mapper with the JobConf's setMapOutputKeyClass() and setMapOutputValueClass() methods**. I see this error even when my Mapper and Reducer outputs match. `Error: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.LongWritable, received org.apache.hadoop.io.Text` It seems to be directly correlating with the InputFormat classes, e.g. to match TextInputFormatClass's expectation of LongWritable key and Text output. – Rakesh Iyer Jul 04 '22 at 18:04