0

I have just started learning hadoop,and running hadoop map-reduce program with custom partitioner and comparator.The problem i am facing is that the primary and secondary sort are not getting done on composite key, more-over the part of one composite-key is getting changed with other compsite-key part.

for example i am creating the following keys inside mapper

key1 -> tagA,1 
key2 -> tagA,1 
key3 -> tagA,1
key4 -> tagA,1 
key5 -> tagA,2 
key6 -> tagA,2
key7 -> tagB,1 
key8 -> tagB,1 
key9 -> tagB,1
key10 -> tagB,1 
key11 -> tagB,2 
key12 -> tagB,2

and partitioner and combiner are as follows

    //Partitioner
public static class TaggedJoiningPartitioner implements Partitioner<Text, Text> {   
    @Override
    public int getPartition(Text key, Text value, int numPartitions) {
        String line = key.toString();
        String tokens[] = line.split(",");
        return (tokens[0].hashCode() & Integer.MAX_VALUE)% numPartitions;
    }
    @Override
    public void configure(JobConf arg0) {
        // TODO Auto-generated method stub //NOT OVERRIDING THIS METHOD
    }
}
//Comparator
public static class TaggedJoiningGroupingComparator extends WritableComparator {

    public TaggedJoiningGroupingComparator() {
        super(Text.class, true);
    }

    @Override
    public int compare(WritableComparable a, WritableComparable b) {
        String taggedKey1[] = ((Text)a).toString().split(",");
        String taggedKey2[] = ((Text)b).toString().split(",");
        return taggedKey1[0].compareTo(taggedKey2[0]);
    }
}

in reducer these key are grouped properly according to tags but not sorted properly. The order and content of keys in reducers is as follows:

//REDUCER 1
key1 -> tagA,1 
key2 -> tagA,1 
key3 -> tagA,1
key5 -> tagA,1 //2 changed by 1 here
key6 -> tagA,1 //2 changed by 1 here
key4 -> tagA,1 

//REDUCER 2
key7 ->  tagB,1 
key11 -> tagB,1 //2 changed by 1 here
key12 -> tagB,1 //2 changed by 1 here
key8 ->  tagB,1 
key9 ->  tagB,1
key10 -> tagB,1  

trying for long-time to resolve it but not succeded yet, Any help appreciated ?

Bruce_Wayne
  • 1,564
  • 3
  • 18
  • 41
  • I don't see a secondary sort here. Where is the secondary sort happening? – SSaikia_JtheRocker Sep 27 '14 at 19:14
  • I am using old API of Hadoop.So there is nothing like job.setSortComparatorClass(CompositeKeyComparator.class); available. Can you please provide equivalent for old API. ? – Bruce_Wayne Sep 27 '14 at 21:28
  • Also i am setting partitioner and comparator in JobConf object as given below :- conf.setPartitionerClass(TaggedJoiningPartitioner.class); conf.setOutputKeyComparatorClass(TaggedJoiningGroupingComparator.class); – Bruce_Wayne Sep 27 '14 at 21:36
  • How are you iterating in the reducer? Please show some code if you can. – SSaikia_JtheRocker Sep 28 '14 at 06:34
  • got it fixed with below given change, now the key on same reducer are getting sorted properly. – Bruce_Wayne Sep 28 '14 at 17:47

1 Answers1

0

Finally got it working, actually i changed

conf.setOutputKeyComparatorClass(TaggedJoiningGroupingComparator.class); 

to

conf.setOutputValueGroupingComparator(TaggedJoiningGroupingComparator.class);

Also From hadoop API docs. --

setOutputValueGroupingComparator(Class<? extends RawComparator> theClass)
Set the user defined RawComparator comparator for grouping keys in the input to the reduce.
Bruce_Wayne
  • 1,564
  • 3
  • 18
  • 41