How can I use MultipleoutputFormai in Hadoop 0.20?

Question

I am working with Hadoop 0.20 and I want to have two reduce output files instead of one output. I know that MultipleOutputFormat doesn't work in Hadoop 0.20. I added the hadoop1.1.1-core jar file in the build path of my project in Eclipse. But it still shows the last error.

Here is my code:

public static class ReduceStage extends Reducer<IntWritable, BitSetWritable, IntWritable, Text>
{
    private MultipleOutputs mos;
    public ReduceStage() {
        System.out.println("ReduceStage");
    }

    public void setup(Context context) {
        mos = new MultipleOutputs(context);
    }

    public void reduce(final IntWritable key, final Iterable<BitSetWritable> values, Context output ) throws IOException, InterruptedException
    {
        mos.write("text1", key, new Text("Hello")); 
    }

    public void cleanup(Context context) throws IOException {
        try {
            mos.close();
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}

And in the run():

FileOutputFormat.setOutputPath(job, ConnectedComponents_Nodes);
job.setOutputKeyClass(MultipleTextOutputFormat.class);
MultipleOutputs.addNamedOutput(job, "text1", TextOutputFormat.class,
                IntWritable.class, Text.class);

The error is:

java.lang.NoSuchMethodError: org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputName(Lorg/apache/hadoop/mapreduce/JobContext;Ljava/lang/String;)V
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:409)
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:370)
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:348)
at bitsetmr$ReduceStage.reduce(bitsetmr.java:179)
at bitsetmr$ReduceStage.reduce(bitsetmr.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)

What can I do to have MultipleOutputFormat? Did I use the code right?

Building with 1.1.1 but running in 0.20 does not work. Actually, 0.20 will be loaded first and 1.1.1 can not override 0.20. — zsxwing, Sep 25 '13 at 05:50
@zsxwing: So How can i use multipleoutputformat in hadoop 0.20? — ali abdoli, Sep 25 '13 at 05:52
You need to copy these codes to your project, or upgrade your hadoop. — zsxwing, Sep 25 '13 at 06:05
Even I faced this problem and it was because of two different version of jars available in path, I excluded other version from path and it worked — Abhishek Gayakwad, Feb 06 '14 at 08:45
@aliabdoli I am facing the same issue. I am working on a legacy system with hadoop 0.20. How did you solve this? — wypul, Dec 10 '19 at 18:29

score 0 · Answer 1 · answered Sep 25 '13 at 07:19

You may go for an overridden extension of MultipleTextOutputFormat and then make all the contents of the record to be the part of 'value', while make the file-name or path to be the key.

There is an oddjob library. They have a range of outputformat implementations. The one which you want is MultipleLeafValueOutputFormat : Writes to the file specified by the key, and only writes the value.

Now,say you have to write the following pairs and your separator is say the tab character ('\t'): <"key1","value1"> (you want this to be written in filename1) <"key2","value2"> (you want this to be written in filename2)

So, now the output from reducer would transform into follows: <"filename1","key1\tvalue1"> <"filename2","key2\tvalue2">

Also, don't forget that the above defined class should be added as the outformat class to the job:

conf.setOutputFormat(MultipleLeafValueOutputFormat.class);

One thing to note here is that you will need to work with the old mapred package rather than the mapreduce package. But that shouldn't be a problem.

Yes this will give you 2 separate files, `filename1` and `filename2`! — Amar, Sep 25 '13 at 07:30

score 0 · Answer 2 · edited Jan 01 '16 at 05:49

0

Firstly, you should make sure FileOutputFormat.setOutputName has the same code between versions 0.20 and 1.1.1. If not, you must have compatible version to compile your code. If the same, there must be some parameter error in your command.

I encountered the same issue and I removed -Dmapreduce.user.classpath.first=true from run command and it works. hope that helps!

edited Jan 01 '16 at 05:49

Manjunath Ballur

6,287
3
37
48

answered Sep 29 '13 at 08:17

grand_answer

1

How can I use MultipleoutputFormai in Hadoop 0.20?

2 Answers2