Control number of mappers on each node in cluster

Question

I have a very small 2 node Hadoop-HBase cluster. I am executing MapReduce jobs on it. I use Hadoop-2.5.2. I have 32GB(nodes have 64GB memory each) free for MapReduce in each node with the configuration in yarn site as follows

<property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>32768</value>
</property>
<property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>15</value>
</property>

My resource requirements are 2GB for each mapper/reducer that gets executed. I have configured this in the mapred-site.xml Given these configurations, with a total of about 64GB in memory and 30 vcores, I see about 31 mappers or 31 reducers getting executed in parallel.

While all this is fine, there is one part that I am trying to figure out. The number of mappers or reducers executing in parallel, is not the same on both nodes, one of the nodes has higher number of tasks than the other. Why does this happen? Can this be controlled? If so, how?

I suppose YARN does not see this as resources of a node rather resources of a cluster and spawns the tasks wherever it can in the cluster. Is this understanding correct? If not, what is the correct explanation to the said behaviour during a MR execution?

Is both of nodes same? and have same free resource? is your hdfs balance? — Rahim Dastar, Oct 10 '18 at 07:12
@RahimDastar Both nodes are same configuration wise. Both have guaranteed 32 GB free. However, The remaining 32 GB varies. Now when I check the `free -g`, I can see 38 GB in Node A and 40 in Node B as available memory. I would not say HDFS is balanced. Node A has 3% less disk usage as compared to Node B in HDFS data node directories. Sometimes I have seen this difference go upto 8%. The MR jobs are initiated from Node A. — PKU, Oct 10 '18 at 12:18
Where exactly are you referring to? I am not using Hortonworks, jfyi. — PKU, Oct 10 '18 at 14:38
Do you have another files in HDFS? or all files of HDFS is processed by your job? What's your replication factor? — Rahim Dastar, Oct 10 '18 at 19:29
Moving the conversation to a [chat](https://chat.stackoverflow.com/rooms/181635/discussion-between-knp-and-rahim-dastar). — PKU, Oct 10 '18 at 19:48

Control number of mappers on each node in cluster

0 Answers0