0

We've been switching our 10 nodes cluster from MapReduce to Tez lately and we are experiencing issues with resource management since then. It seems like preemption does not work as expected :

  1. a very consuming job arrives it gets all free ressources
  2. a second job arrives and wait for resources to be freed by job1
  3. job2 gets a very little resource (5%) over a long time and it keeps increasing very slowly but most of the time never reach the fair share.

I'm assuming the preemption mechanism used by the FairShare yarn scheduler is not working as it should and resources only get assigned to job2 when some job1 containers are done.

I've looked into Tez doc and I could think that Tez would have been developed with the Capacity Scheduler as a defacto scheduler, but can't find any help for the FairShare scheduler.

Some conf variables used that may help :

hive.server2.tez.default.queues=default
hive.server2.tez.initialize.default.sessions=false
hive.server2.tez.session.lifetime=162h
hive.server2.tez.session.lifetime.jitter=3h
hive.server2.tez.sessions.init.threads=16
hive.server2.tez.sessions.per.default.queue=10
hive.tez.auto.reducer.parallelism=false
hive.tez.bucket.pruning=false
hive.tez.bucket.pruning.compat=true
hive.tez.container.max.java.heap.fraction=0.8
hive.tez.container.size=-1
hive.tez.cpu.vcores=-1
hive.tez.dynamic.partition.pruning=true
hive.tez.dynamic.partition.pruning.max.data.size=104857600
hive.tez.dynamic.partition.pruning.max.event.size=1048576
hive.tez.enable.memory.manager=true
hive.tez.exec.inplace.progress=true
hive.tez.exec.print.summary=false
hive.tez.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat
hive.tez.input.generate.consistent.splits=true
hive.tez.log.level=INFO
hive.tez.max.partition.factor=2.0
hive.tez.min.partition.factor=0.25
hive.tez.smb.number.waves=0.5
hive.tez.task.scale.memory.reserve-fraction.min=0.3
hive.tez.task.scale.memory.reserve.fraction=-1.0
hive.tez.task.scale.memory.reserve.fraction.max=0.5
yarn.scheduler.fair.preemption=true
yarn.scheduler.fair.preemption.cluster-utilization-threshold=0.7
yarn.scheduler.maximum-allocation-mb=32768
yarn.scheduler.maximum-allocation-vcores=4
yarn.scheduler.minimum-allocation-mb=2048
yarn.scheduler.minimum-allocation-vcores=1
yarn.resourcemanager.scheduler.address=${yarn.resourcemanager.hostname}:8030
yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
yarn.resourcemanager.scheduler.client.thread-count=50
yarn.resourcemanager.scheduler.monitor.enable=false
yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
mazelx
  • 142
  • 8
  • 1
    Preemption doesn't occur unless you have separate queues. None of the YARN schedulers support preemption within a queue. – tk421 Jun 27 '18 at 17:17
  • If my understanding is correct the queues are being created with the user name due to the queuePlacementPolicy's user rule. see : https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Allocation_file_format – mazelx Jun 27 '18 at 22:44
  • Do you know if the jobs are being run by distinct users? – tk421 Jun 28 '18 at 02:16
  • Yes different users. Some of them run multiple concurrent jobs but it also happens with a single job. – mazelx Jun 28 '18 at 07:36
  • Users get a queue as root. – mazelx Jun 28 '18 at 07:39
  • 1
    Is that queue "root." defined in the fair scheduler's XML? If it is, are the relevant fields `weight`, `fairSharePreemptionTimeout`, `fairSharePreemptionThreshold` defined in the queue? If not, then it's unlikely preemption has been defined for that queue. – tk421 Jun 29 '18 at 19:04
  • The user queues are not defined in scheduler's XML but the preemption is set for the root queue. The child queues should inherits from the parent properties right ? – mazelx Jul 02 '18 at 08:47
  • No, if the queue is not EXPLICITLY defined in the XML, you cannot preempt to/from it. A user queue is an `ephemeral` or `ad-hoc` queue and it's resources are measured are part of it's parent queue. If you look at the YARN RM UI and click on "Scheduler" you will see the queue structure. – tk421 Jul 02 '18 at 16:25
  • You should look at http://blog.cloudera.com/blog/2017/02/untangling-apache-hadoop-yarn-part-5-using-fairscheduler-queue-properties/ and https://blog.cloudera.com/blog/2016/06/untangling-apache-hadoop-yarn-part-4-fair-scheduler-queue-basics/ for more information. – tk421 Jul 02 '18 at 16:27

0 Answers0