out of Memory Error in Hadoop

Question

I tried installing Hadoop following this http://hadoop.apache.org/common/docs/stable/single_node_setup.html document. When I tried executing this

bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

I am getting the following Exception

java.lang.OutOfMemoryError: Java heap space

Please suggest a solution so that i can try out the example. The entire Exception is listed below. I am new to Hadoop I might have done something dumb . Any suggestion will be highly appreciated.

anuj@anuj-VPCEA13EN:~/hadoop$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
11/12/11 17:38:22 INFO util.NativeCodeLoader: Loaded the native-hadoop library
11/12/11 17:38:22 INFO mapred.FileInputFormat: Total input paths to process : 7
11/12/11 17:38:22 INFO mapred.JobClient: Running job: job_local_0001
11/12/11 17:38:22 INFO util.ProcessTree: setsid exited with exit code 0
11/12/11 17:38:22 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@e49dcd
11/12/11 17:38:22 INFO mapred.MapTask: numReduceTasks: 1
11/12/11 17:38:22 INFO mapred.MapTask: io.sort.mb = 100
11/12/11 17:38:22 WARN mapred.LocalJobRunner: job_local_0001
java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
11/12/11 17:38:23 INFO mapred.JobClient:  map 0% reduce 0%
11/12/11 17:38:23 INFO mapred.JobClient: Job complete: job_local_0001
11/12/11 17:38:23 INFO mapred.JobClient: Counters: 0
11/12/11 17:38:23 INFO mapred.JobClient: Job Failed: NA
java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1257)
    at org.apache.hadoop.examples.Grep.run(Grep.java:69)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.examples.Grep.main(Grep.java:93)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

I'm having this same issue with Hadoop 1.0.0, input is as per getting started wiki page - http://wiki.apache.org/hadoop/GettingStartedWithHadoop. Tried all three solutions here, none of which seem to have any impact at all. — tbroberg, Feb 07 '12 at 08:45
Solved my problem. hadoop was giving /etc/hadoop config directory precedence over conf directory which messed me all up. I debugged this by modifying the bin/hadoop script to print out the java command line at the bottom instead of executing it. — tbroberg, Feb 08 '12 at 02:57

score 80 · Answer 1 · edited Jun 20 '12 at 23:41

80

For anyone using RPM or DEB packages, the documentation and common advice is misleading. These packages install hadoop configuration files into /etc/hadoop. These will take priority over other settings.

The /etc/hadoop/hadoop-env.sh sets the maximum java heap memory for Hadoop, by Default it is:

   export HADOOP_CLIENT_OPTS="-Xmx128m $HADOOP_CLIENT_OPTS"

This Xmx setting is too low, simply change it to this and rerun

   export HADOOP_CLIENT_OPTS="-Xmx2048m $HADOOP_CLIENT_OPTS"

edited Jun 20 '12 at 23:41

ILikeFood

400
2
8
22

answered Mar 16 '12 at 15:49

Zach Garner

2,009
1
16
6

I just had the exact same problem as the OP, and I was using the RPM package. This fixed the problem. Upvoted. – Aaron Burke Aug 12 '13 at 20:13
1

One could also set the property as final in mapred-site.xml and that will not be overwritten. ` mapred.child.java.opts -Xmx1024m true ` – sufinawaz Sep 30 '13 at 14:13
1

In Hadoop 2.x.x, hadoop-env.sh can be found at /etc/hadoop/conf/hadoop-env.sh – Pradeep Dec 23 '13 at 07:47
@Zach As a developer we can not have access to make changes into any conf file. Is there any way where we can set these property at hadoop job call? – Indrajeet Gour Jan 15 '16 at 12:21

Tudor · Accepted Answer · 2014-04-01T15:27:04.227

40

You can assign more memory by editing the conf/mapred-site.xml file and adding the property:

  <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1024m</value>
  </property>

This will start the hadoop JVMs with more heap space.

edited Apr 01 '14 at 15:27

answered Dec 11 '11 at 12:55

Tudor

61,523
12
102
142

@Anuj, try setting the number even higher. If not even 2048m is enough, there must a problem with the implementation. – Tudor Dec 11 '11 at 13:03
I need to edit mapred-site.xml and re execute the command right .. but that didnt work :( – Anuj Dec 11 '11 at 13:05
@Anuj: yes that's what you need to do. – Tudor Dec 11 '11 at 13:06
Don't forget that you need to stop and restart the daemons if you change the properties file. – Tudor Dec 12 '11 at 09:23
Where is this mapred-site.xml located? And does single user need this config file? – mythicalprogrammer Jan 06 '12 at 02:23
@Anuj you might try substantially less. You might not be able to allocate enough memory to start the job. – Carlos Rendon Nov 02 '12 at 20:11
@Tudor : Is it necessary that the java heap space usage cannot go beyond 2048M for a map process ? I run a job which is data structure intensive where the java heap usage is way beyond 2048M, does this mean that I will not be able to run this job ? – harry potter Jun 05 '13 at 13:36
@harry potter: Not really, you can set it as high as you want. The suggestion I made earlier doesn't apply if you're certain you actually need all that space. – Tudor Jun 05 '13 at 14:57
The file is now located in $HADOOP_HOME/etc/hadoop/mapred-site.xml – Ben Mathews May 16 '14 at 20:38

score 12 · Answer 3 · answered Jan 30 '12 at 11:29

12

Another possibility is editing hadoop-env.sh, which contains export HADOOP_CLIENT_OPTS="-Xmx128m $HADOOP_CLIENT_OPTS". Changing 128m to 1024m helped in my case (Hadoop 1.0.0.1 on Debian).

answered Jan 30 '12 at 11:29

Andris Birkmanis

386
2
9

etlolap · Answer 4 · 2012-11-16T16:02:34.187

7

After trying so many combinations, finally I concluded the same error on my environment (Ubuntu 12.04, Hadoop 1.0.4) is due to two issues.

Same as Zach Gamer mentioned above.
don't forget to execute "ssh localhost" first. Believe or not! No ssh would throw an error message on Java heap space as well.

edited Nov 16 '12 at 16:02

answered Nov 16 '12 at 08:03

etlolap

531
1
6
14

I had this problem and "ssh localhost" worked for me! Why does hadoop need to run on ssh for standalone operation? – calvin Aug 04 '13 at 18:31

tworec · Answer 5 · 2017-10-23T11:50:59.730

6

You need to make adjustments to mapreduce.{map|reduce}.java.opts and also to mapreduce.{map|reduce}.memory.mb.

For example:

  hadoop jar <jarName> <fqcn> \
      -Dmapreduce.map.memory.mb=4096 \
      -Dmapreduce.map.java.opts=-Xmx3686m

here is good resource with answer to this question

edited Oct 23 '17 at 11:50

answered May 06 '16 at 14:09

tworec

4,409
2
29
34

score 4 · Answer 6 · edited May 28 '13 at 11:06

4

We faced the same situation.

Modifying the hadoop-env.sh worked out for me.

EXPORT HADOOP_HEAPSIZE would be commented, uncomment that & provide the size of your choice.

By default HEAPSIZE assigned is 1000MB.

edited May 28 '13 at 11:06

AlexVogel

10,601
10
61
71

answered May 28 '13 at 10:44

Mitra Bhanu

41
1

score 4 · Answer 7 · edited Feb 18 '12 at 09:14

4

You can solve this problem by editting the file /etc/hadoop/hadoop-env.sh.

Hadoop was giving /etc/hadoop config directory precedence over conf directory.

I also met with the same situation.

edited Feb 18 '12 at 09:14

Udo Held

12,314
11
67
93

answered Feb 18 '12 at 08:22

wufawei

51
2

score 2 · Answer 8 · answered Nov 06 '12 at 17:18

I installed hadoop 1.0.4 from the binary tar and had the out of memory problem. I tried Tudor's, Zach Garner's, Nishant Nagwani's and Andris Birkmanis's solutions but none of them worked for me.

Editing the bin/hadoop to ignore $HADOOP_CLIENT_OPTS worked for me:

...
elif [ "$COMMAND" = "jar" ] ; then
     CLASS=org.apache.hadoop.util.RunJar
    #Line changed this line to avoid out of memory error:
    #HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
    # changed to:
     HADOOP_OPTS="$HADOOP_OPTS "
...

I'm assuming that there is a better way to do this but I could not find it.

are you running on a Virtual Machine. Most of OOM is caused when running hadoop on a VM with very less memory. removing $HADOOP_CLIENT_OPTS is not a good idea if its production coz you will have to keep a check on the memory being used. give a bigger value rather than totally removing HADOOP_CLIENT_OPTS .. eg: export HADOOP_CLIENT_OPTS="-Xmx2048m $HADOOP_CLIENT_OPTS" — Anuj, Nov 14 '12 at 08:53

score 2 · Answer 9 · answered Jan 10 '13 at 20:26

2

The same exception with Ubuntu, Hadoop 1.1.1. The solution was simple - edit shell variable $HADOOP_CLIENT_OPTS set by some init script. But it took long time to find it =(

answered Jan 10 '13 at 20:26

Odysseus

1,032
2
12
14

Got it! It's /etc/profile.d/hadoop-env.sh linked to /etc/conf/hadoop-env.sh – Odysseus Jan 10 '13 at 20:57

score 2 · Answer 10 · edited Dec 12 '11 at 08:27

2

Run your job like the one below:

bin/hadoop jar hadoop-examples-*.jar grep -D mapred.child.java.opts=-Xmx1024M input output 'dfs[a-z.]+'

The heap space, by default is set to 32MB or 64MB. You can increase the heap space in properties file as, Tudor pointed out, or you can change it for this particular job by setting this property for this particular job.

edited Dec 12 '11 at 08:27

eeerahul

1,629
4
27
38

answered Dec 12 '11 at 07:59

Nishant Nagwani

1,160
3
13
26

@Anuj: did this solve your problem? If yes, please accept the answer. – Nishant Nagwani Dec 13 '11 at 07:54

score 1 · Answer 11 · answered Sep 21 '17 at 06:33

Make sure the mapreduce.child.java.opts have sufficient memory required to run mapred job. Also ensure that mapreduce.task.io.sort.mb should be less than mapreduce.child.java.opts.

Example:

 mapreduce.child.java.opts=Xmx2048m

 mapreduce.task.io.sort.mb=100

Otherwise you'll hit the OOM issue even the HADOOP_CLIENT_OPTS in hadoop-env.sh have enough memory if configured.

score 1 · Answer 12 · edited Aug 15 '19 at 07:30

1

Configure the JVM heap size for your map and reduce processes. These sizes need to be less than the physical memory you configured in the previous section. As a general rule, they should be 80% the size of the YARN physical memory settings.

Configure mapreduce.map.java.opts and mapreduce.reduce.java.opts to set the map and reduce heap sizes respectively, e.g.

<property>  
   <name>mapreduce.map.java.opts</name>  
   <value>-Xmx1638m</value>
</property>
<property>  
   <name>mapreduce.reduce.java.opts</name>  
   <value>-Xmx3278m</value>
</property>

edited Aug 15 '19 at 07:30

B--rian

5,578
10
38
89

answered Aug 15 '19 at 06:38

Pravat Sutar

131
1
2

@Pravat_Sutar: Welcome to SO! Could you please edit your question a bit and add a bit more information, like where that config snippet come from? For more guidance, see also https://stackoverflow.com/help/how-to-answer – B--rian Aug 15 '19 at 06:58

score 0 · Answer 13 · answered Jul 01 '13 at 04:02

0

Exporting the variables by running the following command worked for me:

. conf/hadoop-env.sh

answered Jul 01 '13 at 04:02

Satyajit Rai

96
2

score 0 · Answer 14 · answered Aug 11 '13 at 05:25

On Ubuntu using DEB install (at least for Hadoop 1.2.1) there is a /etc/profile.d/hadoop-env.sh symlink created to /etc/hadoop/hadoop-env.sh which causes it to load every time you log in. In my experience this is not necessary as the /usr/bin/hadoop wrapper itself will eventually call it (through /usr/libexec/hadoop-config.sh). On my system I've removed the symlink and I no longer get weird issues when changing the value for -Xmx in HADOOP_CLIENT_OPTIONS (because every time that hadoop-env.sh script is run, the client options environment variable is updated, though keeping the old value)

score 0 · Answer 15 · answered Jul 20 '15 at 14:19

I ended up with a very similar issue last week. My input file that I was using had a big ass line in it which I could not view. That line was almost 95% of my file size(95% of 1gb! imagine that!). I would suggest you take a look at your input files first. You might be having a malformed input file that you want to look into. Try increasing heap space after you check the input file.

score 0 · Answer 16 · answered Aug 29 '20 at 01:58

If you are using Hadoop on Amazon EMR, a configuration can be added to increase the heap size:

[
  {
    "Classification": "hadoop-env",
    "Properties": {},
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
          "HADOOP_HEAPSIZE": "2048"
        },
        "Configurations": []
      }
    ]
  }
]

out of Memory Error in Hadoop

16 Answers16

Linked