0

I am trying to measure the time consumed by each (key_a,value_a) pair transferred from a mapper Mapper_i to a reducer Reducer_j.

In other words, I would like to know the time taken by (key_a,value_a) from leaving Mapper_i to reaching Reducer_j ?

Is there anyway to get this transfer time from mappers to reducers?

harpun
  • 4,022
  • 1
  • 36
  • 40
  • Not really. key/value mapper output pairs are bundled together into a chunk and that chunk is transferred over to the reducer. Why do you think this would be a useful metric? – Donald Miner Oct 07 '13 at 15:04
  • In fact, I am trying to measure the network delay for data transfer between mappers and reducers. I need this metric in order to see if the network is consuming more time to deliver data from mappers to reducers. If this is the case, then the network is a bottleneck. But I need measures in order to get such conclusions. – user2262938 Oct 07 '13 at 16:53
  • You are better off utilizing a tool like Ganglia to view network utilization. There are several buffers and mechanisms between keys/values in mappers and reducers to give you a reasonable number. – Donald Miner Oct 07 '13 at 17:03
  • I have tried with Ganglia but couldn't get much information. Here are some details. I am running simply the wordcount example under Amazon Elastic MapReduce. I enabled Ganglia and managed to get its figures for both bytes_in and bytes_out for each machines. The figures give the MIN/MAX/AVG for data transfer. The AVG keeps changing over the time which makes it unprecise. What I need : is the data transfer rate (from Ganglia or via any other way) DURING the REAL-TRANFER of data from mappers to reducers. If you have any idea on geting such information using Ganglia, it would be great. – user2262938 Oct 07 '13 at 17:15

0 Answers0