Yarn: How to utilize full cluster resources?

Question

So I am having a cloudera cluster with 7 worker nodes.

30GB RAM
4 vCPUs

Here are some of my configurations which I found important (from Google) in tuning performance of my cluster. I am running with:

yarn.nodemanager.resource.cpu-vcores => 4
yarn.nodemanager.resource.memory-mb => 17GB (Rest reserved for OS and other processes)
mapreduce.map.memory.mb => 2GB
mapreduce.reduce.memory.mb => 2GB
Running nproc => 4 (Number of processing units available)

Now my concern is, when I look at my ResourceManager, I see Available Memory as 119 GB which is fine. But when I run a heavy sqoop job and my cluster is at its peak it uses only ~59 GB of memory, leaving ~60 GB memory unused.

One way which I see, can fix this unused memory issue is increasing map|reduce.memory to 4 GB so that we can use upto 16 GB per node.

Other way is to increase the number of containers, which I am not sure how.

4 cores x 7 nodes = 28 possible containers. 3 being used by other processes, only 5 are currently being available for sqoop job.

What should be the right config to improve cluster performance in this case. Can I increase the number of containers, say 2 containers per core. And is it recommended?

Any help or suggestions on the cluster configuration would be highly appreciated. Thanks.

Do you use the DefaultResourceCalculator ? Or did you configure to use the DominantResourceCalculator ? — Nicomak, Jun 20 '16 at 02:52
Can you post your `yarn-site.xml` and `mapred-site.xml` configs ? — Nicomak, Jun 20 '16 at 03:00
I am using cloudera installation. Couldn't find the property `yarn.nodemanager.container-monitor.resource-calculator.class`. Using FairScheduler as scheduler.class if that helps. Any specific config shall I give from `yarn-site.xml` and `mapred-site.xml`? — pratpor, Jun 20 '16 at 06:46
and yarn.scheduler.capacity.resource-calculator in `capacity-scheduler.xml` — Nicomak, Jun 20 '16 at 06:52
Added `DominantResourceCalculator` as `yarn.scheduler.capacity.resource-calculator`. Still doesn't work. Same number of mappers. `yarn.scheduler.maximum-allocation-mb` is 17GB. Is there a way to say run 2 mappers per vCore. That should solve the issue right? — pratpor, Jun 20 '16 at 08:22
I was just asking for DominantResourceCalculator. If you don't set it, then YARN only considers memory, and ignores cpu cores for container creation. So your problem has nothing to do with cores, your container creation is limited by RAM only. Hmmm how many mappers were created during your job ? — Nicomak, Jun 20 '16 at 08:47
Ohkay. So `yarn.scheduler.capacity.resource-calculator` was not set earlier and I tried it with `DominantResourceCalculator`. But anyways, 25-26 mappers are getting created. And I doubt that this number is being decided based on memory. Out of 119GB memory available, at its peak time with 25-26 mappers, my cluster uses only 59GB. 60GB RAM is free but not used. — pratpor, Jun 20 '16 at 09:19
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/115091/discussion-between-pratpor-and-nicomak). — pratpor, Jun 20 '16 at 10:30
hello pratpor, how did you solve your issue. I am facing a similar one here, https://stackoverflow.com/questions/55255515/suggestions-required-in-increasing-utilization-of-yarn-containers-on-our-discove — akash sharma, Mar 22 '19 at 07:37

Nicomak · Accepted Answer · 2016-06-21T02:05:09.890

If your input data is in 26 splits, YARN will create 26 mappers to process those splits in parallel.

If you have 7 nodes with 2 GB mappers for 26 splits, the repartition should be something like:

Node1 : 4 mappers => 8 GB
Node2 : 4 mappers => 8 GB
Node3 : 4 mappers => 8 GB
Node4 : 4 mappers => 8 GB
Node5 : 4 mappers => 8 GB
Node6 : 3 mappers => 6 GB
Node7 : 3 mappers => 6 GB
Total : 26 mappers => 52 GB

So the total memory used in your map reduce job if all mappers are running at the same time will be 26x2=52 GB. Maybe if you add the memory user by the reducer(s) and the ApplicationMaster container, you can reach your 59 GB at some point, as you said ..

If this is the behaviour you are witnessing, and the job is finished after those 26 mappers, then there is nothing wrong. You only need around 60 GB to complete your job by spreading tasks across all your nodes without needing to wait for container slots to free themselves. The other free 60 GB are just waiting around, because you don't need them. Increasing heap size just to use all the memory won't necessarily improve performance.

Edited:

However, if you still have lots of mappers waiting to be scheduled, then maybe its because your installation insconfigured to calculate container allocation using vcores as well. This is not the default in Apache Hadoop but can be configured:

yarn.scheduler.capacity.resource-calculator : The ResourceCalculator implementation to be used to compare Resources in the scheduler. The default i.e. org.apache.hadoop.yarn.util.resource.DefaultResourseCalculator only uses Memory while DominantResourceCalculator uses Dominant-resource to compare multi-dimensional resources such as Memory, CPU etc. A Java ResourceCalculator class name is expected.

Since you defined yarn.nodemanager.resource.cpu-vcores to 4, and since each mapper uses 1 vcore by default, you can only run 4 mappers per node at a time.

In that case you can double your value of yarn.nodemanager.resource.cpu-vcores to 8. Its just an arbitrary value it should double the number of mappers.

Hey. Thanks for the reply. I am actually running sqoop job for around 1.7TB of data. Thats a lot and I am giving around 2000 mappers for the job (`--m 2000`). It takes upto 1.5 - 2 hours to complete the task with the current configuration, but with 60 GB of memory remaining unused throughout the job because only 26 mappers ( with `2 GB`) can run at a time. This clearly gives me an impression that it is being limited by the number of vcores available. I will check more and update here if I find any better solution than increasing map task memory to 4GB. — pratpor, Jun 20 '16 at 11:34
Then increase yarn.nodemanager.resource.cpu-vcores to 8. Its just an arbitrary value you can double it, an if your calculator was really limiting because of cpu, it should double the number of mappers — Nicomak, Jun 20 '16 at 12:12
Yes I tried that and it actually works to double the number of mappers. Ran a test job (pi) and found that it degrades the job performance as now 2 mappers will be running per core sharing resource and pi job is more computation based than memory based. So all I understood is, its actually a trade off here. For jobs not requiring much memory, we can increase mapper count (vcores), else its safer to just increase the map task memory to 4GB. Thanks for all the help. — pratpor, Jun 20 '16 at 13:00
Well its quite interesting that your distribution of YARN automatically uses vcores. In the vanilla Apache Hadoop it does not by default. You are right there is sometimes a trade-off. In your case its not much of a trade-off, because (1) you have 4 CPUs on a host, so if you use 8 vcores it doesnt even make processing faster, its just multi-threading, (2) you are doing lots of disk reads or writes with scoop so if you don't have SSDs or many HDDs per node, you are killing your perfomance because of seek time latency by using more mappers. — Nicomak, Jun 20 '16 at 13:33

Yarn: How to utilize full cluster resources?

1 Answers1

Linked