5

I had two conceptual doubts related to mapreduce and hadoop.I know a simple one iteration map-reduce program, know what a mapper,reducer, shuffler is.. But still want to know about the following questions

1)when is iterative map reduce done?

2)i know identity mapper/reducer gives same output as the fed input. But when do we use an identity mapper/reducer?

pranav shah
  • 115
  • 1
  • 2
  • 7

1 Answers1

2

1) An example of an iterative MR algorithm is Dijkstra's shortest path algorithm. At each iteration the nearest neighbours of all active nodes are explored, the reduce phase is used to check if the destination node is already reached. Other examples are Facebook's friends of friends (FoF) algorithm to find to suggest new friends.

2) An identity mapper is used can be used (among others!) if you would only want to sort your input. An identity reducer can be used for example to implement embarrasingly parallel algorithms where you just use the mappers to perform the parallel tasks but you want the output key value pairs to be sorted.

Hope this got you on your way.

Note that apart from identity reducer you can also have NO reducer set (then the map output is not sorted).

DDW
  • 1,975
  • 2
  • 13
  • 26
  • Total order partitioning (complete sorting) is not possible using identity reducer. It just sorts the individual reducer records. Another use case would be to have merged output in a single file (by specifying single reducer) of mappper's output data. – Ashish Dec 30 '14 at 06:53
  • You seem to be confusing (or misformulating) things: if you use a total order partitioner your output will be fully sorted with an identity reducer if you use no partitioner the output will be sorted per reduce task but not in general. – DDW Dec 30 '14 at 09:13
  • I said the same thing :-) You can say it's kind of grouping (Same keys together) which we can achieve using identity reducer – Ashish Dec 30 '14 at 11:01