Alternative to the default hashpartioner provided with hadoop

Question

I have a hadoop MapReduce program that distributes keys unevenly. Some reducers end up with two keys, some with one key, and some with none. how do I force hadoop to distribute each partition with a certain key to a separate reducer. I have nine unique keys of the form:

0,0
0,1
0,2
1,0
1,1
1,2
2,0
2,1
2,2

and I set the job.setNumReduceTasks(9); but the hashpartitioner seems to hash two keys to the same hashcode causing overlapped keys being sent to the same reducer and leaving some reducers idle.

Does a random partitioner resolve this? will it send each unique key to a random reducer guaranteeing each reducer receives a single key. How do I enable it and replace the default?

EDIT:

can someone please explain why my output looks like

-rw-r--r--   1 user supergroup          0 2018-04-19 18:58 outbin9/_SUCCESS
drwxr-xr-x   - user supergroup          0 2018-04-19 18:57 outbin9/_logs
-rw-r--r--   1 user supergroup        869 2018-04-19 18:57 outbin9/part-r-00000
-rw-r--r--   1 user supergroup       1562 2018-04-19 18:57 outbin9/part-r-00001
-rw-r--r--   1 user supergroup        913 2018-04-19 18:58 outbin9/part-r-00002
-rw-r--r--   1 user supergroup       1771 2018-04-19 18:58 outbin9/part-r-00003
-rw-r--r--   1 user supergroup        979 2018-04-19 18:58 outbin9/part-r-00004
-rw-r--r--   1 user supergroup        880 2018-04-19 18:58 outbin9/part-r-00005
-rw-r--r--   1 user supergroup          0 2018-04-19 18:58 outbin9/part-r-00006
-rw-r--r--   1 user supergroup          0 2018-04-19 18:58 outbin9/part-r-00007
-rw-r--r--   1 user supergroup        726 2018-04-19 18:58 outbin9/part-r-00008

The larger groups part-r-00001 and part-r-00003 have received keys 1,0 and 2,2 / 0,0 and 1,2 respectively. And notice that part-r-00006 and part-r-00007 are empty.

score 0 · Answer 1 · answered Apr 20 '18 at 05:50

HashPartitioner is the default partitioner in Hadoop, which creates one Reduce task for each unique “key”. All the values with the same key goes to the same instance of your reducer, in a single call to the reduce function.

If user is interested to store a particular group of results in different reducers, then the user can write his own partitioner implementation. It can be general purpose or custom made to the specific data types or values that you expect to use in user application.

Custom Partitioner is a process that allows you to store the results in different reducers, based on the user condition. By setting a partitioner to partition by the key, we can guarantee that, records for the same key will go to the same reducer. A partitioner ensures that only one reducer receives all the records for that particular key.

sample example link

my attempt of making a custom partioner https://stackoverflow.com/questions/49944554/custom-partitioner-in-hadoop-error-java-lang-nosuchmethodexception-init — zaranaid, Apr 21 '18 at 10:39

Alternative to the default hashpartioner provided with hadoop

1 Answers1

Linked