2

i am learning partitioner concept now.can any one explain me the below piece of code.it is hard for me to understand

public class TaggedJoiningPartitioner extends Partitioner<TaggedKey,Text> {

    @Override
    public int getPartition(TaggedKey taggedKey, Text text, int numPartitions) {
        return taggedKey.getJoinKey().hashCode() % numPartitions;
    }
}

how this taggedKey.getJoinKey().hashCode() % numPartitions determine which reducer to be executed for a key?

can any one explain me this?

user1585111
  • 1,019
  • 6
  • 19
  • 35

1 Answers1

5

It's not as complex as you think once you break things down a little bit.

taggedKey.getJoinKey().hashCode() will simply return an integer. Every object will have a hashCode() function that simply returns a number that will hopefully be unique to that object itself. You could look into the source code of TaggedKey to see how it works if you'd like, but all you need to know is that it returns an integer based on the contents of the object.

The % operator performs modulus division, which is where you return the remainder after performing division. (8 % 3 = 2, 15 % 7 = 1, etc.).

So let's say you have 3 partitioners (numPartitions = 3). Every time you do modulus division with 3, you'll get either 0, 1, or 2, no matter what number is passed. This is used to determine which of the 3 partitioners will get the data.

The whole idea of partitioners is that you can use them to group data to be sorted. If you wanted to sort by month, you could pass every piece of data with the string "January" to the first partition, "December" to the 12th partitioner, etc. But in your case it on the outside looks a bit confusing. But really they just want to spread the data out (hopefully) evenly, so they're using a simple hash/modulus function to choose the partition at random.

Eric Alberson
  • 1,116
  • 1
  • 11
  • 23
  • thank you so much,after the partitioner gets the data,how it will be passed to the reducers? – user1585111 Aug 21 '13 at 17:09
  • 2
    @user1585111 The partitioner technically doesn't "get" the data. It has one function, getPartition, like you posted above. This function is called to simply tell which reducer isntance the mapper needs to pass data to. If this answer is correct for you, you should accept it as correct :) – Eric Alberson Aug 21 '13 at 17:31
  • 1
    Well like I said before, sorting is done by hashcode + modulus division. You can read more about Partitioning here, Yahoo explains it very well: http://developer.yahoo.com/hadoop/tutorial/module5.html#partitioning – Eric Alberson Aug 21 '13 at 18:22