0

I am currently developing a hadoop program. The program is killed by Hadoop because the mapper tasker takes up high memory (around 7G). Is there a way to let one machine run only one task at a time?

I tried settings shown below but it didn't work. The task was killed by hadoop.

conf.set("mapreduce.tasktracker.reserved.physicalmemory.mb", "7000");
conf.set("mapred.tasktracker.map.tasks.maximum", "1");

The cluster is using mapr-m3 and every machine has 15.6GB memory with 70% availability.

Praveen Lobo
  • 6,956
  • 2
  • 28
  • 40
Yukun
  • 13
  • 5

1 Answers1

0

I think you have to set the virtual machine options (this is for both map and reduce tasks):

mapred.child.java.opts=-Xmx7000m

If you have the new API supported you can specify it for the mapper only with:

mapreduce.map.java.opts=-Xmx7000m

I had similar problems and also logged the virtual machine heapsizes, more in this:
small blog post about checking java heap sizes

Note that also reducers are running on a node, so they might compete for memory, make sure to limit the number of reduce slots as well if necessary.

DDW
  • 1,975
  • 2
  • 13
  • 26
  • where should i set this value? Like: conf.set("mapreduce.map.java.opts", "-Xmx7000m1")? btw, does this make sure a mapper only run one task a time? – Yukun Aug 30 '13 at 14:22
  • You need to set it in combination with your parameter, limiting the number of map slots in the tasktracker. (but standard Xmx is set to a much lower value than 7000m, usually below 1000m). You can set the value the same way you set the number of map slots, conf.set() is ok, or you can modify the mapred.site.xml file. – DDW Sep 02 '13 at 07:29