MapReduce Network Bandwidth

Question

I am trying to measure the time consumed by each (key_a,value_a) pair transferred from a mapper Mapper_i to a reducer Reducer_j.

In other words, I would like to know the time taken by (key_a,value_a) from leaving Mapper_i to reaching Reducer_j ?

Is there anyway to get this transfer time from mappers to reducers?

Not really. key/value mapper output pairs are bundled together into a chunk and that chunk is transferred over to the reducer. Why do you think this would be a useful metric? — Donald Miner, Oct 07 '13 at 15:04
In fact, I am trying to measure the network delay for data transfer between mappers and reducers. I need this metric in order to see if the network is consuming more time to deliver data from mappers to reducers. If this is the case, then the network is a bottleneck. But I need measures in order to get such conclusions. — user2262938, Oct 07 '13 at 16:53
You are better off utilizing a tool like Ganglia to view network utilization. There are several buffers and mechanisms between keys/values in mappers and reducers to give you a reasonable number. — Donald Miner, Oct 07 '13 at 17:03
I have tried with Ganglia but couldn't get much information. Here are some details. I am running simply the wordcount example under Amazon Elastic MapReduce. I enabled Ganglia and managed to get its figures for both bytes_in and bytes_out for each machines. The figures give the MIN/MAX/AVG for data transfer. The AVG keeps changing over the time which makes it unprecise. What I need : is the data transfer rate (from Ganglia or via any other way) DURING the REAL-TRANFER of data from mappers to reducers. If you have any idea on geting such information using Ganglia, it would be great. — user2262938, Oct 07 '13 at 17:15

0 Answers0