1

Can not find answer on this question.

How to gracefully stop YARN role on a data node and wait till all running jobs on a datanode will finish with status success.

I know that in ClouderaManager you can decommission yarn role when you can stop it. If I do YARN role decommission The running jobs will fail with exit code killed or crash status.

Is this a safe way to YARN role stop on a data node?

Is this a graceful yarn role shutdown or where is other way to do this? all jobs have killed status after YARN role decommission

2 Answers2

0

This is documented poorly on Apache website for hadoop 3.3:

Create an XML file with NodeManagers you wish to decommission:

<?xml version="1.0"?>
<hosts>
  <host><name>host1</name></host> <!-- normal 'kill' --> 
  <host><name>host2</name><timeout>123</timeout></host> <!-- allows jobs 123 seconds to finish --> 
  <host><name>host3</name><timeout>-1</timeout></host><!-- allows jobs infinite seconds to finish --> 
</hosts>

Update your config(yarn-site.xml) to point to this file (No restart required)

yarn.resourcemanager.nodes.exclude-path=[path/to/exculd/file]

run update: (initiate decomission)

yarn rmadmin -refreshNodes 

Alternatively you could set a graceful timeout for all nodes:

yarn.resourcemanager.nodemanager-graceful-decommission-timeout-secs

Alternatively you manually set a graceful timeout:

yarn rmadmin -refreshNodes -g [timeout in seconds] -client
Matt Andruff
  • 4,974
  • 1
  • 5
  • 21
  • Thank you for your response and afford to help. In a Cloudera hadoop the process is a little different. I found the solution and test it on our cluster. I posted all steps above. I hope it will help other people in future. – user2784340 Feb 09 '22 at 09:48
  • I'm glad you were able to solve the issue. I'm sorry I didn't frame the question in terms of Cloudera Manager. Don't forget to mark your own answer correct if that is the answer. If you found my answer helpful please upvote it. – Matt Andruff Feb 09 '22 at 14:30
0

YARN Graceful decommission will wait for jobs to complete. You can pass the timeout value so that YARN will start decommission after x seconds. If no jobs running within x secs then automatically YARN will start decommission without waiting for timeout to happen.

CM -> Clusters -> yarn -> Configuration -> In search bar (

yarn.resourcemanager.nodemanager-graceful-decommission-timeout-secs) Set the value and save the configuration and do restart to deploy configs. To decommission a specific host/more hosts

CM -> Clusters -> yarn -> Instances (Select the hosts that you want to decommission)

Click -> Actions for selected hosts -> Decommission In case you want to decommission all the roles of a host then follow this doc https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cm_mc_host_maint.html#decomm_host