3

I need to implement below Functionality using Hadoop Map-Reduce?

1) I am reading one input for a mapper from one source & another input from another different input source.

2) I need to pass both output of mapper into a single reducer for further process.

Is there any to do the above requirement in Hadoop Map-Reduce

HarshaKP
  • 159
  • 3
  • 12
  • 9
    If your mappers output the same type data, you can use `MultipleInputs.addInputPath(job, path, inputFormatClass, mapperClass);` to add multiple mappers. – zsxwing Sep 18 '13 at 11:09
  • ya thanks for the info, but i need to add two different outputs of mapper for a single reducer? How can i achieve this?? – HarshaKP Sep 20 '13 at 09:55
  • 2
    I usually convert all the different types of value to BytesWritable, and recovery them to the actual value in the reducer. Also a tag needs to be attached, so that the reducer can know how to recovery the values. – zsxwing Sep 20 '13 at 10:00
  • @zsxwing +1 Yes this is the way – WestCoastProjects Feb 10 '14 at 02:57

2 Answers2

0

You can create a custom writable. You can populate the same in the Mapper. Later in the Reducer you can get the Custom writable Object and do the necessary business operation.

0

MultipleInputs.addInputPath is what you are looking for. This is how your configuration would look like. Make sure both AnyMapper1 and AnyMapper2 write the same output expected by MergeReducer

JobConf conf = new JobConf(Merge.class);
conf.setJobName("merge");

conf.setOutputKeyClass(IntWritable.class); 
conf.setOutputValueClass(Text.class); 
conf.setReducerClass(MergeReducer.class);
conf.setOutputFormat(TextOutputFormat.class);

MultipleInputs.addInputPath(conf, inputDir1, SequenceFileInputFormat.class, AnyMapper1.class);
MultipleInputs.addInputPath(conf, inputDir2, TextInputFormat.class, AnyMapper2.class);

FileOutputFormat.setOutputPath(conf, outputPath);
Jerry Ragland
  • 611
  • 10
  • 17