How to set reduce tasks based on cluster size in Hadoop

Question

I'd like to set the # of reduce tasks to be exactly equal to the # of available reduce slots in one job.

By default the reduce tasks are being calculated as ~1.75 times the # of reduce slots available (on Elastic Mapreduce). I notice that my job completes reduce tasks very uniformly, so it will better to run 1 reducer per reduce slot in the job.

But how can I identify the cluster metrics from within my job configuration?

Have you looked at this thread? http://stackoverflow.com/questions/11523480/how-to-collect-hadoop-cluster-size-number-of-cores-information — anonymous1fsdfds, Dec 17 '12 at 13:31

score 1 · Accepted Answer · answered Dec 17 '12 at 13:43

you can use ClusterMetrics Class to get the status information on the current state of the Map-Reduce cluster, like Size of the cluster, Number of blacklisted and decommissioned trackers, Slot capacity of the cluster, The number of currently occupied/reserved map & reduce slots etc.

How to set reduce tasks based on cluster size in Hadoop

1 Answers1