2

I have implemented WritableComparable for my map job and have passed three values to it.

public class KeyCustom implementsw WritableComparable<KeyCustom>
{
   private Text placeOfBirth;
   private Text country;
   private LongWritable age;
   //Implemented constructors and set methods, write, readFields, hashCode and equals
   @Override
   public int compareTo(KeyCustom arg0)
   {
      return placeOfBirth.compareTo(arg0.placeOfBirth);
   }
}

But then when I log these three fields in my reducer I can clearly see that all the people with the same country are being grouped together. It would be great if someone could help me out so that all my reducers get the people with the same place of birth. I dont' know how to do this or if my compareTo function is wrong.

Thanks for all the help.

user3690321
  • 65
  • 2
  • 8

2 Answers2

3

You're trying to solve your task with wrong approach. What you really need is to implement proper partitioner.

By the way you don't need special compareTo() implementation to do special partitioning.

UPDATE:

Try just to change partitioner to TotalOrderPartitioner in your job and probably your issue will be solved. Here is not bad example of what it should look alike.

Roman Nikitchenko
  • 12,800
  • 7
  • 74
  • 110
  • Its not about partitioning, I have it has three separated fields and as this is a custom WritableComparable, I just need to pass it my three fields. – user3690321 Jul 10 '14 at 20:54
  • I might be wrong but I see it's EXACTLY about partitioning if you intend to distribute data among reducers. The other question what you will do with objects order INSIDE reducer - yes, here you need compareTo() but this will be next step. This is because current default partitioner (at least by now) balances data so next ordered (!) element goes with good probability to different reducer. As far as I remember it is based on object hash. That's why things like `TotalOrderPartitioner` are used by people around when they need to change things. By the way it could be your case too. – Roman Nikitchenko Jul 11 '14 at 10:20
  • Oh ok, but this Partitioner only compares two objects, how will I compare three objects? – user3690321 Jul 11 '14 at 17:40
  • Partitioner is not about comparing objects but about distribution among reducers. Please see links I've provided. Inyour case this should guarantee people with the same place of birth go to the same reducer. – Roman Nikitchenko Jul 12 '14 at 15:28
1

I would say you have two option

1) A Custom Partioner, as discussed above ?

OR 2) Overwride HashCode() as

@Override  public int hashCode() {
    return placeOfBirth.hashCode();
}

Reason

The Default partitioner class work upon the HashCode of the writableComaparable. Hence, for a custom WritableComparable, you need to have either a HashCode() overidden, which enables the Partioner to seggreate the maps output to reducers. Or you could implement and assign your own partioner class to the job which would consider only the palceOfBirthField for partioning.

user2458922
  • 1,691
  • 1
  • 17
  • 37