1

I have a job which I am trigger from in EMR. The master triggers the mapper. Once it is done, it loads a heavweight operation in memory and then evenutualy will dump out. Right now, the job which runs on the cluster fails after a few minutes because it runs out of heap space. By default it sets about 1000m on its master

Tried the exact action below, but that did not work . The program is still set to 1000m

--bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop --args -s,mapred.child.java.opts=Xmx4000m

user2655578
  • 11
  • 1
  • 4

1 Answers1

1

There is a specific way provided by EMR to set the heap size of the namenode, use the following bootstrap command while launching the cluster:

--bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-daemons --args --namenode-heap-size=4096

Also you may try using a config file instead. Create an XML config file and upload it to s3.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
      <name>mapred.child.java.opts</name>
      <value>-Xmx4096m</value>
  </property>
</configuration>

Now launch the cluster with the following bootstrap action:

--bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop --args "--mapred-config-file, s3:///custom-heap-size.xml"

Amar
  • 11,930
  • 5
  • 50
  • 73
  • The argument namenode-heap-size. Does this also set to the master and slave ? – user2655578 Nov 01 '13 at 02:31
  • This parameter is only to set the heap size for the NameNode of HDFS. Read about NameNode here : http://wiki.apache.org/hadoop/NameNode – Amar Nov 04 '13 at 18:21