how partition works on the data from mapper to reducer?

Question

I'm using hadoop streaming to process a huge file. Say I have a file, each line is a number, I want to split this file into 2 files, one containing odd numbers, the other even.

Using hadoop, I might specify 2 reducers for this job, cause when the numbers go from mapper to reducer, I thought the number goes to which reducer is determined by number % 2, right?

But I was told otherwise, it's not simply number % 2 but hash(number) % 2 which makes a number go to which reducer, is that true?

If so, How could I make it? Can I specify a Partitioner or something to make it right?

score 0 · Answer 1 · answered Aug 05 '13 at 03:49

0

How about doing the split in your mapper?

For example, each mapper does

if int(number) % 2 == 0:
  # Output "EVEN", number
else:
  # Output "ODD", number

Then reduce over two keys: EVEN and ODD, writing them to the appropriate file.

answered Aug 05 '13 at 03:49

jrs

606
3
5

how partition works on the data from mapper to reducer?

1 Answers1