16

Unlike HortonWorks or Cloudera, AWS EMR does not seem to give any GUI to change xml configurations of various hadoop ecosystem frameworks.

Logging into my EMR namenode and doing a quick

find \ -iname yarn-site.xml

I was able to find it to be located at /etc/hadoop/conf.empty/yarn-site.xml and capacity-scheduler to be located at /etc/hadoop/conf.empty/capacity-scheduler.xml.

But note how these are under conf.empty and I suspect these might not be the actual locations for yarn-site and capacity-scheduler xmls.

I understand that I can change these configurations while making a cluster but what I need to know is how to be able to change them without tearing apart the cluster.

I just want to play around scheduling properties and such and try out different schedulers to identify what might work will with my spark applications.

Thanks in advance!

Prasad Khode
  • 6,602
  • 11
  • 44
  • 59
Kumar Vaibhav
  • 2,632
  • 8
  • 32
  • 54
  • you can edit the actual file using some sort of text editor like `vim` or you can follow these [steps](https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration) and create a new config directory and reload your cluster or you can simply type `man yarn` in your ssh'd cluster and it tells you right there how to do it. – gold_cy Apr 14 '17 at 21:40
  • The problem with editing directly seems to be that the exact same files seem to be in multiple places. – Kumar Vaibhav Apr 14 '17 at 21:44
  • Check the `man yarn` pages it explicitly tells you how to configure the options – gold_cy Apr 14 '17 at 21:46
  • 2
    @Kumar Vaibhav, Actually AWS-EMR using puppet to deploy the hadoop configurations and it requires lot of understanding of Puppet framework to hack it. I feel this is very awkward way to change configs since i am also stuck in the same page as you are and not yet find the answer. Just editing yarn-site or mapred-site xml file in master node is not enough to work with spark where it has been configured as yarn-client and restarting yarn services impacts nothing. – S.K. Venkat Jul 15 '17 at 09:40

1 Answers1

21

Well, the yarn-site.xml and capacity-scheduler.xml are indeed under correct locations (/etc/hadoop/conf.empty/) and on running cluster , editing them on master node and restarting YARN RM Daemon will change the scheduler.

When spinning up a new cluster , you can use EMR Configurations API to change appropriate values. http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html

For example : Specify appropriate values in capacity-scheduler and yarn-site classifications on your Configuration for EMR to change those values in corresponding XML files.

Edit: Sep 4, 2019 : With Amazon EMR version 5.21.0 and later, you can override cluster configurations and specify additional configuration classifications for each instance group in a running cluster. You do this by using the Amazon EMR console, the AWS Command Line Interface (AWS CLI), or the AWS SDK.

Please see https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps-running-cluster.html

jc mannem
  • 2,293
  • 19
  • 23
  • 3
    Using EMR Configurations API helps only at the time of creating cluster to modify the configs based on our requirement. afterwards, it's very difficult to push the configs to core/task nodes when we want to experiment something. – S.K. Venkat Jul 15 '17 at 09:55
  • Pushing config's on the fly is not supported yet on EMR. AWS has some services like SSM that you can make use of on ec2 to run any commands that you need on a cluster. a blog post talks about this briefly but on a different note of a custom AMI scenario. https://aws.amazon.com/blogs/big-data/create-custom-amis-and-push-updates-to-a-running-amazon-emr-cluster-using-amazon-ec2-systems-manager/ – jc mannem Jan 10 '18 at 23:38
  • 2
    to see YARN RM: sudo status hadoop-yarn-resourcemanager to restart YARN RM: sudo stop hadoop-yarn-resourcemanager sudo start hadoop-yarn-resourcemanager – ski_squaw Feb 02 '18 at 02:15