I am running a job on EMR with mrjob; I am using AMI version 2.4.7 and Hadoop version 1.0.3.
I want to specify the number of reducers for a job, because I want to provide a higher parallellism to the next one. Reading the answers to the other questions on this site, I gathered that I should set these parameters, and so I did:
mapred.reduce.tasks=576
mapred.tasktracker.reduce.tasks.maximum=24
However, it seems like the second option is not picked up: both the EMR and the Hadoop interfaces report that there are 576 reduce tasks to run, but the capacity of the cluster remains at 72 (r3.8xlarge instances).
I even see that the option is set in var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_XXX/job.xml:<property><name>mapred.tasktracker.reduce.tasks.maximum</name><value>24</value></property>
. Still, only the default number (9) of actual reducers are running at the same time.
Why is the option not picked up by EMR? Or is there a different way to force a higher number of reducers on an instance?