how to avoid mapreduce OutOfMemory Java heap space error while using kite-dataset to import data?

Question

on my hortonworks HDP 2.6 cluster, I'm using kite-dataset tool to import data:

./kite-dataset -v csv-import ml-100k/u.data  ratings

I'm getting this error:

java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:986)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

My cluster nodes have 16 GB or RAM, some of which is listed as available.

What can I do to avoid this error?

It looks like this runs a mapreduce job and your mappers run out of memory. You probably need to set `mapred.map.child.java.opts` to a higher value. Kite might let you pass this in as an argument, or you might need to change it in the `mapred-site.xml` — Binary Nerd, Jun 13 '16 at 19:01

score 0 · Answer 1 · answered Mar 02 '17 at 11:08

My first impulse would be to ask what your startup parameters are. Typically, when you run MapReduce and experience an out-of-memory error, you would use something like the following as your startup params:

-Dmapred.map.child.java.opts=-Xmx1G -Dmapred.reduce.child.java.opts=-Xmx1G

The key here is that these two amounts are cumulative. So, the amounts you specificy added together should not come close to exceeding the memory available on your system after you start MapReduce.

how to avoid mapreduce OutOfMemory Java heap space error while using kite-dataset to import data?

1 Answers1