1

Assuming only one reducer.

My scenario is to get the list of top N scorers in the university. The data is in format. The Map/reduce framework, by default, sorting the data, in ascending order. But I want the list in descending order, or atleast if I can access the sorted list from the end, my work becomes damm easy. Instead of sending a lot of data to the reducer, I can restrict the data to a limit. (I want to override the predefined Shuffle/Sort) Thanks & Regards Ashwanth

Jack Daniel
  • 2,527
  • 3
  • 31
  • 52

1 Answers1

0

I guess Combiners is what you want. It runs along with the Mappers and they typically do what a reducer does but instead on a single mapper's output data. Generally the combiner class is set the same as the reducer. In your case you can sort and pick top-K elements in each mapper and send only those out.

So instead of sending all your map output records you will be sending only a maximum of K * number of mappers records to the reducer.

You can find example usage on http://wiki.apache.org/hadoop/WordCount.

Bonus - Check out http://blog.optimal.io/3-differences-between-a-mapreduce-combiner-and-reducer/ for major differences between a combiner and a reducer.