Hadoop Map Reduce Program

Question

When I was trying the Map Reduce programming example from Hadoop in Action book based on Hadoop 0.20 API I got the error

java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text

But as far as i checked i am passing everything properly. It would be really helpful if someone can help me with this.

Here is the code. Its the same code which is in the book.

@SuppressWarnings("unused")
public class CountPatents extends Configured implements Tool {
    @SuppressWarnings("deprecation")

    public static class MapClass extends MapReduceBase implements Mapper<Text, Text, Text, Text> {
        public void map(Text key, Text value,OutputCollector<Text, Text> output,Reporter reporter) throws IOException {
            output.collect(value, key);
        }
    }
public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text, IntWritable> {
    public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        int count=0;
        while(values.hasNext()){
            count=count+1;

            values.next();

        }


        output.collect(key, new IntWritable(count));
    }
}


    public int run(String[] args) throws Exception {

    Configuration conf = getConf();
    JobConf job = new JobConf(conf, CountPatents.class);
    Path in = new Path(args[0]);
    Path out = new Path(args[1]);
    FileInputFormat.setInputPaths(job, in);
    FileOutputFormat.setOutputPath(job, out);
    job.setJobName("MyJob");
    job.setMapperClass(MapClass.class);
    job.setReducerClass(Reduce.class);
    job.setInputFormat(KeyValueTextInputFormat.class);
    job.setOutputFormat(TextOutputFormat.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
    job.set("key.value.separator.in.input.line", ",");
    JobClient.runJob(job);
    return 0;
    }
    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new CountPatents(), args);
        System.exit(res);


    }

    }

score 8 · Answer 1 · answered Mar 23 '11 at 23:48

8

From a quick look (not running the code locally), it looks like you are setting the output of the job to be of type Text when you set job.setOutputValueClass(Text.class);, but the output type on your reducer is set to IntWritable. That's likely the error.

answered Mar 23 '11 at 23:48

tau-neutrino

3,270
7
25
20

1

I concur. Hadoop forces you to repeat the types for your keys in three places - the mapper definition, the reducer definition and the job configuration. All three must match for the job to run. – Mark Tozzi Mar 24 '11 at 00:40
I tried doing that but it did not work. Then I converted the program such that the input and output are basically text only. That worked!! – Sri Mar 25 '11 at 04:46

score 0 · Answer 2 · answered Mar 18 '13 at 09:16

Error should be in output from reducer:

Your resuce class defination is as follows:

public static class Reduce extends MapReduceBase implements Reducer

so output value should be of IntWritable type.

However, you have mentioned job.setOutputValueClass(Text.class);

So as per configuration , output of reducer should be Text.

Solution: In the configuration , add following lines job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class);

and Modify: job.setOutputValueClass(IntWritable.class);

Then try to run

score 0 · Answer 3 · answered Dec 05 '13 at 18:53

0

Map emits < Text,Text >

So set

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);

setMapOutputKeyClass setMapOutputValueClass

answered Dec 05 '13 at 18:53

Arun A K

2,205
2
27
45

score 0 · Answer 4 · answered Jul 20 '17 at 18:23

In your reducer function you are using OutputCollector which means Output key class would be of type Text and Output value class would be of type IntWritable. However in main (run) function, you have set job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class);.

Change job.setOutputValueClass(Text.class) to job.setOutputValueClass(IntWritable.class) and you are good to go !

Also it is always better to set MapperOutputKeyType and MapperOutputValueType to avoid any discrepancy. Hadoop uses the Writable interface based mechanism instead of the native Java Serialization mechanism. Unlike the Java Serialization mechanism, this method does not encapsulate the class name in the serialized entity. Hence the explicit class name is required to instantiate these classes from the Mapper to Reducer as is not possible to deserialize byte arrays representing Writable instances without knowing class being deserialized into (Reducer input key and value instance). This information needs to be explicitly provided by invoking setMapOutputKeyClass and setMapOutputValueClass on the Job instance

score 0 · Answer 5 · answered May 31 '11 at 18:07

0

Missed a call:

job.setMapOutputValueClass(IntWritable.class);

Same problem using the new 0.20 interface, and the new "Job" object, in place of JobConf.

answered May 31 '11 at 18:07

Mojo

2,687
5
30
42

score -1 · Answer 6 · answered Oct 22 '21 at 12:56

public static class MapClass extends MapReduceBase implements Mapper<Text, Text, Text, Text> { public void map(Text key, Text value,OutputCollector<Text, Text> output,Reporter reporter) throws IOException { output.collect(value, key); } } public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text, IntWritable> { public void reduce(Text key, Iterator values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int count=0; while(values.hasNext()){ count=count+1;

        values.next();

    }


    output.collect(key, new IntWritable(count));
}

}

public int run(String[] args) throws Exception {

Configuration conf = getConf();
JobConf job = new JobConf(conf, CountPatents.class);
Path in = new Path(args[0]);
Path out = new Path(args[1]);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJobName("MyJob");
job.setMapperClass(MapClass.class);
job.setReducerClass(Reduce.class);
job.setInputFormat(KeyValueTextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.set("key.value.separator.in.input.line", ",");
JobClient.runJob(job);
return 0;
}
public static void main(String[] args) throws Exception {
    int res = ToolRunner.run(new Configuration(), new CountPatents(), args);
    System.exit(res);


}

}

Thank you for this code snippet, which might provide some limited, immediate help. A [proper explanation](https://meta.stackexchange.com/q/114762/349538) would greatly improve its long-term value by showing why this is a good solution to the problem and would make it more useful to future readers with other, similar questions. Please [edit] your answer to add some explanation, including the assumptions you’ve made. — helvete, Oct 22 '21 at 13:21

Hadoop Map Reduce Program

6 Answers6