4

I have several heterogeneous inputs that need to be tackled with different mappers to produce a homogeneous map that can be afterwards reduced by multiple instances of a single reducer. Can it be done in a more elegant way than concatenating outputs of all the mappers and feeding them to the id-mapper that would just emit the same results as it received? I am using Python Hadoop Streaming API, so it's a bit more complicated than using MultipleInputs Java interface.

whoever
  • 575
  • 4
  • 18

1 Answers1

0

What you are looking for is MultipleInputs . You should write different mapper for different heterogeneous input.

In your driver you should map the different path to their respective mapper.

All these mapper should convert their respective map output to a standard output which will be consumed by the reducer.

http://bytepadding.com/big-data/map-reduce/multipleinputs-in-map-reduce

KrazyGautam
  • 2,839
  • 2
  • 21
  • 31