0

Which function sorts the output of Map task in the Reduce phase in Hadoop Src 2.7.1 and when does the sorting phase begin?

I want to know, which function in Hadoop is responsible for sorting the Map output and what is the sorting algorithm used?

Manjunath Ballur
  • 6,287
  • 3
  • 37
  • 48
aminir
  • 89
  • 7
  • 1
    This question has been answered here: http://stackoverflow.com/questions/5779750/mapreduce-shuffle-sort-method. QuickSort is used at Mapper side. At the Reducer, the sorted outputs from the Mappers are merged, before being reduced. – Manjunath Ballur Oct 15 '15 at 17:34
  • Thanks another question is that in literature thought that in reduce phase there are 3 sub phases 1-copy(shuffle) 2-sort 3- reduce you believe that at reducer sort is merge only when merge in this phase begin (after copying all map task output to reducer?) – aminir Oct 16 '15 at 18:50
  • Yes, there are 3 phases, copy, sort and reduce. But Fetcher (which fetches data from each of the mappers), also merges the data (which was already sorted on map side), maintaining their original sort order. You can refer to code in Shuffle.java and Fetcher.java. – Manjunath Ballur Oct 16 '15 at 19:08
  • thanks again when merge(sort) is begin at reduce phase in the Hadoop implementation? which event occurs in the source file? – aminir Oct 16 '15 at 19:31
  • It is determined by configuration parameter "mapreduce.reduce.merge.inmem.threshold". Read its description here: https://hadoop.apache.org/docs/r0.23.11/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml – Manjunath Ballur Oct 16 '15 at 19:42
  • Hi dear Ballur very thanks for your attention. in Eclipse how i can see mapoutput and merging execution in debugging mode with breakpoints what is main purpose of fetcher.java in the Hadoop? – aminir Oct 17 '15 at 18:11
  • Fetcher run on Reducer side and fetches the output from the different Mappers. I don't know how to debug using Eclipse. – Manjunath Ballur Oct 19 '15 at 06:04

1 Answers1

2

The map output is sorted using Quicksort technique during the spilling of intermediate KV (key-value) pair generated from Map tasks and it goes to the particular Reducer.

On the Reducer side, the KV pairs again get sorted using Merge sort technique and form the groups. Sorting is needed in the Reducer side, because the same intermediate KV pair may come from n-no.of Map tasks.