61

I tried installing Hadoop following this http://hadoop.apache.org/common/docs/stable/single_node_setup.html document. When I tried executing this

bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' 

I am getting the following Exception

java.lang.OutOfMemoryError: Java heap space

Please suggest a solution so that i can try out the example. The entire Exception is listed below. I am new to Hadoop I might have done something dumb . Any suggestion will be highly appreciated.

anuj@anuj-VPCEA13EN:~/hadoop$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
11/12/11 17:38:22 INFO util.NativeCodeLoader: Loaded the native-hadoop library
11/12/11 17:38:22 INFO mapred.FileInputFormat: Total input paths to process : 7
11/12/11 17:38:22 INFO mapred.JobClient: Running job: job_local_0001
11/12/11 17:38:22 INFO util.ProcessTree: setsid exited with exit code 0
11/12/11 17:38:22 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@e49dcd
11/12/11 17:38:22 INFO mapred.MapTask: numReduceTasks: 1
11/12/11 17:38:22 INFO mapred.MapTask: io.sort.mb = 100
11/12/11 17:38:22 WARN mapred.LocalJobRunner: job_local_0001
java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
11/12/11 17:38:23 INFO mapred.JobClient:  map 0% reduce 0%
11/12/11 17:38:23 INFO mapred.JobClient: Job complete: job_local_0001
11/12/11 17:38:23 INFO mapred.JobClient: Counters: 0
11/12/11 17:38:23 INFO mapred.JobClient: Job Failed: NA
java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1257)
    at org.apache.hadoop.examples.Grep.run(Grep.java:69)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.examples.Grep.main(Grep.java:93)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Anuj
  • 9,222
  • 8
  • 33
  • 30
  • What does your input file contain? – Tudor Dec 11 '11 at 14:12
  • I also suspect that file have one huge line – David Gruzman Dec 11 '11 at 20:04
  • I'm having this same issue with Hadoop 1.0.0, input is as per getting started wiki page - http://wiki.apache.org/hadoop/GettingStartedWithHadoop. Tried all three solutions here, none of which seem to have any impact at all. – tbroberg Feb 07 '12 at 08:45
  • 3
    Solved my problem. hadoop was giving /etc/hadoop config directory precedence over conf directory which messed me all up. I debugged this by modifying the bin/hadoop script to print out the java command line at the bottom instead of executing it. – tbroberg Feb 08 '12 at 02:57

16 Answers16

80

For anyone using RPM or DEB packages, the documentation and common advice is misleading. These packages install hadoop configuration files into /etc/hadoop. These will take priority over other settings.

The /etc/hadoop/hadoop-env.sh sets the maximum java heap memory for Hadoop, by Default it is:

   export HADOOP_CLIENT_OPTS="-Xmx128m $HADOOP_CLIENT_OPTS"

This Xmx setting is too low, simply change it to this and rerun

   export HADOOP_CLIENT_OPTS="-Xmx2048m $HADOOP_CLIENT_OPTS"
ILikeFood
  • 400
  • 2
  • 8
  • 22
Zach Garner
  • 2,009
  • 1
  • 16
  • 6
  • I just had the exact same problem as the OP, and I was using the RPM package. This fixed the problem. Upvoted. – Aaron Burke Aug 12 '13 at 20:13
  • 1
    One could also set the property as final in mapred-site.xml and that will not be overwritten. ` mapred.child.java.opts -Xmx1024m true ` – sufinawaz Sep 30 '13 at 14:13
  • 1
    In Hadoop 2.x.x, hadoop-env.sh can be found at /etc/hadoop/conf/hadoop-env.sh – Pradeep Dec 23 '13 at 07:47
  • @Zach As a developer we can not have access to make changes into any conf file. Is there any way where we can set these property at hadoop job call? – Indrajeet Gour Jan 15 '16 at 12:21
40

You can assign more memory by editing the conf/mapred-site.xml file and adding the property:

  <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1024m</value>
  </property>

This will start the hadoop JVMs with more heap space.

Tudor
  • 61,523
  • 12
  • 102
  • 142
  • @Anuj, try setting the number even higher. If not even 2048m is enough, there must a problem with the implementation. – Tudor Dec 11 '11 at 13:03
  • I need to edit mapred-site.xml and re execute the command right .. but that didnt work :( – Anuj Dec 11 '11 at 13:05
  • @Anuj: yes that's what you need to do. – Tudor Dec 11 '11 at 13:06
  • Don't forget that you need to stop and restart the daemons if you change the properties file. – Tudor Dec 12 '11 at 09:23
  • Where is this mapred-site.xml located? And does single user need this config file? – mythicalprogrammer Jan 06 '12 at 02:23
  • @Anuj you might try substantially less. You might not be able to allocate enough memory to start the job. – Carlos Rendon Nov 02 '12 at 20:11
  • @Tudor : Is it necessary that the java heap space usage cannot go beyond 2048M for a map process ? I run a job which is data structure intensive where the java heap usage is way beyond 2048M, does this mean that I will not be able to run this job ? – harry potter Jun 05 '13 at 13:36
  • @harry potter: Not really, you can set it as high as you want. The suggestion I made earlier doesn't apply if you're certain you actually need all that space. – Tudor Jun 05 '13 at 14:57
  • The file is now located in $HADOOP_HOME/etc/hadoop/mapred-site.xml – Ben Mathews May 16 '14 at 20:38
12

Another possibility is editing hadoop-env.sh, which contains export HADOOP_CLIENT_OPTS="-Xmx128m $HADOOP_CLIENT_OPTS". Changing 128m to 1024m helped in my case (Hadoop 1.0.0.1 on Debian).

7

After trying so many combinations, finally I concluded the same error on my environment (Ubuntu 12.04, Hadoop 1.0.4) is due to two issues.

  1. Same as Zach Gamer mentioned above.
  2. don't forget to execute "ssh localhost" first. Believe or not! No ssh would throw an error message on Java heap space as well.
etlolap
  • 531
  • 1
  • 6
  • 14
  • I had this problem and "ssh localhost" worked for me! Why does hadoop need to run on ssh for standalone operation? – calvin Aug 04 '13 at 18:31
6

You need to make adjustments to mapreduce.{map|reduce}.java.opts and also to mapreduce.{map|reduce}.memory.mb.

For example:

  hadoop jar <jarName> <fqcn> \
      -Dmapreduce.map.memory.mb=4096 \
      -Dmapreduce.map.java.opts=-Xmx3686m

here is good resource with answer to this question

tworec
  • 4,409
  • 2
  • 29
  • 34
4

We faced the same situation.

Modifying the hadoop-env.sh worked out for me.

EXPORT HADOOP_HEAPSIZE would be commented, uncomment that & provide the size of your choice.

By default HEAPSIZE assigned is 1000MB.

AlexVogel
  • 10,601
  • 10
  • 61
  • 71
4

You can solve this problem by editting the file /etc/hadoop/hadoop-env.sh.

Hadoop was giving /etc/hadoop config directory precedence over conf directory.

I also met with the same situation.

Udo Held
  • 12,314
  • 11
  • 67
  • 93
wufawei
  • 51
  • 2
2

I installed hadoop 1.0.4 from the binary tar and had the out of memory problem. I tried Tudor's, Zach Garner's, Nishant Nagwani's and Andris Birkmanis's solutions but none of them worked for me.

Editing the bin/hadoop to ignore $HADOOP_CLIENT_OPTS worked for me:

...
elif [ "$COMMAND" = "jar" ] ; then
     CLASS=org.apache.hadoop.util.RunJar
    #Line changed this line to avoid out of memory error:
    #HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
    # changed to:
     HADOOP_OPTS="$HADOOP_OPTS "
...

I'm assuming that there is a better way to do this but I could not find it.

Brian C.
  • 6,455
  • 3
  • 32
  • 42
  • are you running on a Virtual Machine. Most of OOM is caused when running hadoop on a VM with very less memory. removing $HADOOP_CLIENT_OPTS is not a good idea if its production coz you will have to keep a check on the memory being used. give a bigger value rather than totally removing HADOOP_CLIENT_OPTS .. eg: export HADOOP_CLIENT_OPTS="-Xmx2048m $HADOOP_CLIENT_OPTS" – Anuj Nov 14 '12 at 08:53
2

The same exception with Ubuntu, Hadoop 1.1.1. The solution was simple - edit shell variable $HADOOP_CLIENT_OPTS set by some init script. But it took long time to find it =(

Odysseus
  • 1,032
  • 2
  • 12
  • 14
2

Run your job like the one below:

bin/hadoop jar hadoop-examples-*.jar grep -D mapred.child.java.opts=-Xmx1024M input output 'dfs[a-z.]+' 

The heap space, by default is set to 32MB or 64MB. You can increase the heap space in properties file as, Tudor pointed out, or you can change it for this particular job by setting this property for this particular job.

eeerahul
  • 1,629
  • 4
  • 27
  • 38
Nishant Nagwani
  • 1,160
  • 3
  • 13
  • 26
1

Make sure the mapreduce.child.java.opts have sufficient memory required to run mapred job. Also ensure that mapreduce.task.io.sort.mb should be less than mapreduce.child.java.opts.

Example:

 mapreduce.child.java.opts=Xmx2048m

 mapreduce.task.io.sort.mb=100

Otherwise you'll hit the OOM issue even the HADOOP_CLIENT_OPTS in hadoop-env.sh have enough memory if configured.

S.K. Venkat
  • 1,749
  • 2
  • 23
  • 35
1

Configure the JVM heap size for your map and reduce processes. These sizes need to be less than the physical memory you configured in the previous section. As a general rule, they should be 80% the size of the YARN physical memory settings.

Configure mapreduce.map.java.opts and mapreduce.reduce.java.opts to set the map and reduce heap sizes respectively, e.g.

<property>  
   <name>mapreduce.map.java.opts</name>  
   <value>-Xmx1638m</value>
</property>
<property>  
   <name>mapreduce.reduce.java.opts</name>  
   <value>-Xmx3278m</value>
</property>
B--rian
  • 5,578
  • 10
  • 38
  • 89
Pravat Sutar
  • 131
  • 1
  • 2
  • @Pravat_Sutar: Welcome to SO! Could you please edit your question a bit and add a bit more information, like where that config snippet come from? For more guidance, see also https://stackoverflow.com/help/how-to-answer – B--rian Aug 15 '19 at 06:58
0

Exporting the variables by running the following command worked for me:

. conf/hadoop-env.sh
0

On Ubuntu using DEB install (at least for Hadoop 1.2.1) there is a /etc/profile.d/hadoop-env.sh symlink created to /etc/hadoop/hadoop-env.sh which causes it to load every time you log in. In my experience this is not necessary as the /usr/bin/hadoop wrapper itself will eventually call it (through /usr/libexec/hadoop-config.sh). On my system I've removed the symlink and I no longer get weird issues when changing the value for -Xmx in HADOOP_CLIENT_OPTIONS (because every time that hadoop-env.sh script is run, the client options environment variable is updated, though keeping the old value)

borice
  • 1,009
  • 1
  • 8
  • 15
0

I ended up with a very similar issue last week. My input file that I was using had a big ass line in it which I could not view. That line was almost 95% of my file size(95% of 1gb! imagine that!). I would suggest you take a look at your input files first. You might be having a malformed input file that you want to look into. Try increasing heap space after you check the input file.

Adi Kish
  • 79
  • 3
  • 9
0

If you are using Hadoop on Amazon EMR, a configuration can be added to increase the heap size:

[
  {
    "Classification": "hadoop-env",
    "Properties": {},
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
          "HADOOP_HEAPSIZE": "2048"
        },
        "Configurations": []
      }
    ]
  }
]
Jay Prall
  • 5,295
  • 5
  • 49
  • 79