I'm using hadoop streaming to process a huge file. Say I have a file, each line is a number, I want to split this file into 2 files, one containing odd numbers, the other even.
Using hadoop, I might specify 2 reducers for this job, cause when the numbers go from mapper to
reducer, I thought the number goes to which reducer is determined by number % 2
, right?
But I was told otherwise, it's not simply number % 2
but hash(number) % 2
which makes a number
go to which reducer, is that true?
If so, How could I make it? Can I specify a Partitioner or something to make it right?