Preemption with Tez along with the yarn FairShare scheduler supported?

Question

We've been switching our 10 nodes cluster from MapReduce to Tez lately and we are experiencing issues with resource management since then. It seems like preemption does not work as expected :

a very consuming job arrives it gets all free ressources
a second job arrives and wait for resources to be freed by job1
job2 gets a very little resource (5%) over a long time and it keeps increasing very slowly but most of the time never reach the fair share.

I'm assuming the preemption mechanism used by the FairShare yarn scheduler is not working as it should and resources only get assigned to job2 when some job1 containers are done.

I've looked into Tez doc and I could think that Tez would have been developed with the Capacity Scheduler as a defacto scheduler, but can't find any help for the FairShare scheduler.

Some conf variables used that may help :

hive.server2.tez.default.queues=default
hive.server2.tez.initialize.default.sessions=false
hive.server2.tez.session.lifetime=162h
hive.server2.tez.session.lifetime.jitter=3h
hive.server2.tez.sessions.init.threads=16
hive.server2.tez.sessions.per.default.queue=10
hive.tez.auto.reducer.parallelism=false
hive.tez.bucket.pruning=false
hive.tez.bucket.pruning.compat=true
hive.tez.container.max.java.heap.fraction=0.8
hive.tez.container.size=-1
hive.tez.cpu.vcores=-1
hive.tez.dynamic.partition.pruning=true
hive.tez.dynamic.partition.pruning.max.data.size=104857600
hive.tez.dynamic.partition.pruning.max.event.size=1048576
hive.tez.enable.memory.manager=true
hive.tez.exec.inplace.progress=true
hive.tez.exec.print.summary=false
hive.tez.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat
hive.tez.input.generate.consistent.splits=true
hive.tez.log.level=INFO
hive.tez.max.partition.factor=2.0
hive.tez.min.partition.factor=0.25
hive.tez.smb.number.waves=0.5
hive.tez.task.scale.memory.reserve-fraction.min=0.3
hive.tez.task.scale.memory.reserve.fraction=-1.0
hive.tez.task.scale.memory.reserve.fraction.max=0.5
yarn.scheduler.fair.preemption=true
yarn.scheduler.fair.preemption.cluster-utilization-threshold=0.7
yarn.scheduler.maximum-allocation-mb=32768
yarn.scheduler.maximum-allocation-vcores=4
yarn.scheduler.minimum-allocation-mb=2048
yarn.scheduler.minimum-allocation-vcores=1
yarn.resourcemanager.scheduler.address=${yarn.resourcemanager.hostname}:8030
yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
yarn.resourcemanager.scheduler.client.thread-count=50
yarn.resourcemanager.scheduler.monitor.enable=false
yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy

Preemption doesn't occur unless you have separate queues. None of the YARN schedulers support preemption within a queue. — tk421, Jun 27 '18 at 17:17
If my understanding is correct the queues are being created with the user name due to the queuePlacementPolicy's user rule. see : https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Allocation_file_format — mazelx, Jun 27 '18 at 22:44
Yes different users. Some of them run multiple concurrent jobs but it also happens with a single job. — mazelx, Jun 28 '18 at 07:36
Is that queue "root." defined in the fair scheduler's XML? If it is, are the relevant fields `weight`, `fairSharePreemptionTimeout`, `fairSharePreemptionThreshold` defined in the queue? If not, then it's unlikely preemption has been defined for that queue. — tk421, Jun 29 '18 at 19:04
The user queues are not defined in scheduler's XML but the preemption is set for the root queue. The child queues should inherits from the parent properties right ? — mazelx, Jul 02 '18 at 08:47
No, if the queue is not EXPLICITLY defined in the XML, you cannot preempt to/from it. A user queue is an `ephemeral` or `ad-hoc` queue and it's resources are measured are part of it's parent queue. If you look at the YARN RM UI and click on "Scheduler" you will see the queue structure. — tk421, Jul 02 '18 at 16:25
You should look at http://blog.cloudera.com/blog/2017/02/untangling-apache-hadoop-yarn-part-5-using-fairscheduler-queue-properties/ and https://blog.cloudera.com/blog/2016/06/untangling-apache-hadoop-yarn-part-4-fair-scheduler-queue-basics/ for more information. — tk421, Jul 02 '18 at 16:27

Preemption with Tez along with the yarn FairShare scheduler supported?

0 Answers0