I would like my map and reduce tasks to run in parallel. However, despite trying every trick in the bag, they are still running sequentially. I read from How to set the precise max number of concurrently running tasks per node in Hadoop 2.4.0 on Elastic MapReduce, that using the following formula, one can set the number of tasks running in parallel.
min (yarn.nodemanager.resource.memory-mb / mapreduce.[map|reduce].memory.mb,
yarn.nodemanager.resource.cpu-vcores / mapreduce.[map|reduce].cpu.vcores)
However, I did that, as you can see from the yarn-site.xml and mapred-site.xml I am using below. But the tasks still run sequentially. Note that I am using the open source Apache Hadoop and not Cloudera. Would shifting to Cloudera solve the problem? Also note that my input files are big enough that dfs.block.size should also not be an issue.
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>131072</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>64</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>16384</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>16384</value>
</property>
<property>
<name>mapreduce.map.cpu.vcores</name>
<value>8</value>
</property>
<property>
<name>mapreduce.reduce.cpu.vcores</name>
<value>8</value>
</property>
</configuration>