1

I'd like to set the # of reduce tasks to be exactly equal to the # of available reduce slots in one job.

By default the reduce tasks are being calculated as ~1.75 times the # of reduce slots available (on Elastic Mapreduce). I notice that my job completes reduce tasks very uniformly, so it will better to run 1 reducer per reduce slot in the job.

But how can I identify the cluster metrics from within my job configuration?

David Parks
  • 30,789
  • 47
  • 185
  • 328
  • Have you looked at this thread? http://stackoverflow.com/questions/11523480/how-to-collect-hadoop-cluster-size-number-of-cores-information – anonymous1fsdfds Dec 17 '12 at 13:31

1 Answers1

1

you can use ClusterMetrics Class to get the status information on the current state of the Map-Reduce cluster, like Size of the cluster, Number of blacklisted and decommissioned trackers, Slot capacity of the cluster, The number of currently occupied/reserved map & reduce slots etc.

Tariq
  • 34,076
  • 8
  • 57
  • 79