1) Change Scheduler to FairScheduler
Hadoop distributions use CapacityScheduler
by default (Cloudera uses FairScheduler
as default Scheduler). Add this property to yarn-site.xml
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
2) Set default
Queue
Fair Scheduler creates a queue per user. I.E., if three different users submit jobs then three individual queues will be created and the resources will be shared among the three queues. Disable it by adding this property in yarn-site.xml
<property>
<name>yarn.scheduler.fair.user-as-default-queue</name>
<value>false</value>
</property>
This assures that all the jobs go into a single default
queue.
3) Restrict Maximum Applications
Now that the job queue has been limited to one default
queue. Restrict the maximum number of applications to 1
that can be run in that queue.
Create a file named fair-scheduler.xml
under the $HADOOP_CONF_DIR
and add these entries
<allocations>
<queueMaxAppsDefault>1</queueMaxAppsDefault>
</allocations>
Also, add this property in yarn-site.xml
<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>$HADOOP_CONF_DIR/fair-scheduler.xml</value>
</property>
Restart YARN
services after adding these properties.
On submitting multiple applications, the application ACCEPTED
first will be considered as the Active application and the remaining will be queued as Pending applications. These pending applications will continue to be in ACCEPTED
state until the RUNNING
application is FINISHED
. The Active application will be allowed to utilise all the available resources.
Reference: Hadoop: Fair Scheduler