In a related question (How to set the precise max number of concurrently running tasks per node in Hadoop 2.4.0 on Elastic MapReduce), I ask for formulas relating the number of concurrently running mappers/reducers to YARN and MR2 memory parameters. It turns out that on Elastic MapReduce, when my cluster has between 2 and 10 c3.2xlarge nodes, variations of the formulas mentioned there work okay, giving me 7-9 concurrently running mappers per node; but when the number of c3.2xlarges is 20 or 40, I get cluster underutilization: only 1-4 mappers run per node. Since my job is CPU-bound, this is particularly awful: MR2 delivers _half_the performance of MR1 for me.
Why is this happening?