You could organize your map/reduce computation like this:
Map input: default
Map output: "key: number, value: word"
_ sorting phase by key _
Here you will need to override the default sorter to sort in decreasing order.
Reduce - 1 reducer
Reduce input: "key: number, value: word"
Reduce output: "key: word, value: (number, rank)"
Keep a global counter. For each key-value pair add the rank by incrementing the counter.
Edit: Here is a code snipped of a custom descendant sorter:
public static class IntComparator extends WritableComparator {
public IntComparator() {
super(IntWritable.class);
}
@Override
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
Integer v1 = ByteBuffer.wrap(b1, s1, l1).getInt();
Integer v2 = ByteBuffer.wrap(b2, s2, l2).getInt();
return v1.compareTo(v2) * (-1);
}
}
Don't forget to actually set it as the comparator for your job:
job.setSortComparatorClass(IntComparator.class);