0

I'm using hadoop streaming to process a huge file. Say I have a file, each line is a number, I want to split this file into 2 files, one containing odd numbers, the other even.

Using hadoop, I might specify 2 reducers for this job, cause when the numbers go from mapper to reducer, I thought the number goes to which reducer is determined by number % 2, right?

But I was told otherwise, it's not simply number % 2 but hash(number) % 2 which makes a number go to which reducer, is that true?

If so, How could I make it? Can I specify a Partitioner or something to make it right?

Alcott
  • 17,905
  • 32
  • 116
  • 173

1 Answers1

0

How about doing the split in your mapper?

For example, each mapper does

if int(number) % 2 == 0:
  # Output "EVEN", number
else:
  # Output "ODD", number

Then reduce over two keys: EVEN and ODD, writing them to the appropriate file.

jrs
  • 606
  • 3
  • 5