I would like to have your opinion regarding Partitioner vs MultipleOutputs.
Suppose I have a file which contains keys as
0:aaa
1:bbb
0:ccc
0:ddd
...
1:zzz
I would like have 2 files: one file containing keys starting with 0:
and the other containing keys starting with 1:
. Which approach should I use:
1) Use a custom Partitioner which will parse the keys and returns 0 or 1 for getPartition().
2) Use MultipleOutputs.write in the reduce phase, by parsing the key and providing zero
or one
for the namedOutput
parameter of MultipleOutputs.write.
Which one is better? For me, 1) is better because reducers deal with a single file.