1

I wrote the following shell script in order to configure yarn scheduler, but this doesn't work properly - the creation of Dataproc cluster fails when I set this script as an input argument.

Do you have any idea how to fix this?

Below is the script:

#!/usr/bin/env bash

echo "<allocations>" >> /etc/hadoop/conf/fair-scheduler.xml
echo "  <userMaxAppsDefault>999</userMaxAppsDefault>" >> /etc/hadoop/conf/fair-scheduler.xml
echo "  <queueMaxAppsDefault>999</queueMaxAppsDefault>" >> /etc/hadoop/conf/fair-scheduler.xml
echo "</allocations>" >> /etc/hadoop/conf/fair-scheduler.xml

sed -i '$ d' /etc/hadoop/conf/yarn-site.xml

echo "  <property>" >> /etc/hadoop/conf/yarn-site.xml
echo "    <name>yarn.scheduler.fair.allocation.file</name>" >> /etc/hadoop/conf/yarn-site.xml
echo "    <value>/etc/hadoop/conf/fair-scheduler.xml</value>" >> /etc/hadoop/conf/yarn-site.xml
echo "  </property>" >> /etc/hadoop/conf/yarn-site.xml
echo "</configuration>" >> /etc/hadoop/conf/yarn-site.xml

systemctl restart hadoop-yarn-resourcemanager.service
Igor Dvorzhak
  • 4,360
  • 3
  • 17
  • 31
scalacode
  • 1,096
  • 1
  • 16
  • 38

1 Answers1

0

You need to use Dataproc initialization action to configure YARN Fair Scheduler on Dataproc.

You may take a look at this answer for an example of how it could be done: https://stackoverflow.com/a/49693693/3227693

Igor Dvorzhak
  • 4,360
  • 3
  • 17
  • 31