3

I have some doubt regarding the transfer protocols being used by Hadoop framework to copy the mapper output(which is stored locally on mapper node) to the reducers task (which is not running on same node). - read some blogs that it uses HTTP for Shuffle phase - also read that HDFS data transfers(used by mapreduce jobs) are done using TCP/IP sockets directly. - read about RPC in Hadoop The Definitive guide.

Any pointers/reference will be of great help.

SurjanSRawat
  • 489
  • 1
  • 6
  • 20

1 Answers1

4

Hadoop uses HTTPServlets for intermediate data shuffling. See Figure below (taken from JVM-Bypass for Efficient Hadoop Shuffling by Wang et al.): Intermediate data shuffling in Hadoop

For careful treat have a look at 'JVM-Bypass for Efficient Hadoop Shuffling' work published in 2013 (full-text available).

Denis
  • 148
  • 4