1

I am working with Hadoop 0.20 and I want to have two reduce output files instead of one output. I know that MultipleOutputFormat doesn't work in Hadoop 0.20. I added the hadoop1.1.1-core jar file in the build path of my project in Eclipse. But it still shows the last error.

Here is my code:

public static class ReduceStage extends Reducer<IntWritable, BitSetWritable, IntWritable, Text>
{
    private MultipleOutputs mos;
    public ReduceStage() {
        System.out.println("ReduceStage");
    }

    public void setup(Context context) {
        mos = new MultipleOutputs(context);
    }

    public void reduce(final IntWritable key, final Iterable<BitSetWritable> values, Context output ) throws IOException, InterruptedException
    {
        mos.write("text1", key, new Text("Hello")); 
    }

    public void cleanup(Context context) throws IOException {
        try {
            mos.close();
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}

And in the run():

FileOutputFormat.setOutputPath(job, ConnectedComponents_Nodes);
job.setOutputKeyClass(MultipleTextOutputFormat.class);
MultipleOutputs.addNamedOutput(job, "text1", TextOutputFormat.class,
                IntWritable.class, Text.class);

The error is:

java.lang.NoSuchMethodError: org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputName(Lorg/apache/hadoop/mapreduce/JobContext;Ljava/lang/String;)V
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:409)
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:370)
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:348)
at bitsetmr$ReduceStage.reduce(bitsetmr.java:179)
at bitsetmr$ReduceStage.reduce(bitsetmr.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)

What can I do to have MultipleOutputFormat? Did I use the code right?

Manjunath Ballur
  • 6,287
  • 3
  • 37
  • 48
ali abdoli
  • 33
  • 6

2 Answers2

0

You may go for an overridden extension of MultipleTextOutputFormat and then make all the contents of the record to be the part of 'value', while make the file-name or path to be the key.

There is an oddjob library. They have a range of outputformat implementations. The one which you want is MultipleLeafValueOutputFormat : Writes to the file specified by the key, and only writes the value.

Now,say you have to write the following pairs and your separator is say the tab character ('\t'): <"key1","value1"> (you want this to be written in filename1) <"key2","value2"> (you want this to be written in filename2)

So, now the output from reducer would transform into follows: <"filename1","key1\tvalue1"> <"filename2","key2\tvalue2">

Also, don't forget that the above defined class should be added as the outformat class to the job:

conf.setOutputFormat(MultipleLeafValueOutputFormat.class);

One thing to note here is that you will need to work with the old mapred package rather than the mapreduce package. But that shouldn't be a problem.

Amar
  • 11,930
  • 5
  • 50
  • 73
0

Firstly, you should make sure FileOutputFormat.setOutputName has the same code between versions 0.20 and 1.1.1. If not, you must have compatible version to compile your code. If the same, there must be some parameter error in your command.

I encountered the same issue and I removed -Dmapreduce.user.classpath.first=true from run command and it works. hope that helps!

Manjunath Ballur
  • 6,287
  • 3
  • 37
  • 48