I have a hadoop MapReduce program that distributes keys unevenly. Some reducers end up with two keys, some with one key, and some with none. how do I force hadoop to distribute each partition with a certain key to a separate reducer. I have nine unique keys of the form:
0,0
0,1
0,2
1,0
1,1
1,2
2,0
2,1
2,2
and I set the job.setNumReduceTasks(9); but the hashpartitioner seems to hash two keys to the same hashcode causing overlapped keys being sent to the same reducer and leaving some reducers idle.
Does a random partitioner resolve this? will it send each unique key to a random reducer guaranteeing each reducer receives a single key. How do I enable it and replace the default?
EDIT:
can someone please explain why my output looks like
-rw-r--r-- 1 user supergroup 0 2018-04-19 18:58 outbin9/_SUCCESS
drwxr-xr-x - user supergroup 0 2018-04-19 18:57 outbin9/_logs
-rw-r--r-- 1 user supergroup 869 2018-04-19 18:57 outbin9/part-r-00000
-rw-r--r-- 1 user supergroup 1562 2018-04-19 18:57 outbin9/part-r-00001
-rw-r--r-- 1 user supergroup 913 2018-04-19 18:58 outbin9/part-r-00002
-rw-r--r-- 1 user supergroup 1771 2018-04-19 18:58 outbin9/part-r-00003
-rw-r--r-- 1 user supergroup 979 2018-04-19 18:58 outbin9/part-r-00004
-rw-r--r-- 1 user supergroup 880 2018-04-19 18:58 outbin9/part-r-00005
-rw-r--r-- 1 user supergroup 0 2018-04-19 18:58 outbin9/part-r-00006
-rw-r--r-- 1 user supergroup 0 2018-04-19 18:58 outbin9/part-r-00007
-rw-r--r-- 1 user supergroup 726 2018-04-19 18:58 outbin9/part-r-00008
The larger groups part-r-00001 and part-r-00003 have received keys 1,0 and 2,2 / 0,0 and 1,2 respectively. And notice that part-r-00006 and part-r-00007 are empty.