Hadoop one machine only run one task

Question

I am currently developing a hadoop program. The program is killed by Hadoop because the mapper tasker takes up high memory (around 7G). Is there a way to let one machine run only one task at a time?

I tried settings shown below but it didn't work. The task was killed by hadoop.

conf.set("mapreduce.tasktracker.reserved.physicalmemory.mb", "7000");
conf.set("mapred.tasktracker.map.tasks.maximum", "1");

The cluster is using mapr-m3 and every machine has 15.6GB memory with 70% availability.

Such configuration need restart hadoop and can not set in the client. — zsxwing, Aug 30 '13 at 01:41

score 0 · Answer 1 · answered Aug 30 '13 at 07:49

0

I think you have to set the virtual machine options (this is for both map and reduce tasks):

mapred.child.java.opts=-Xmx7000m

If you have the new API supported you can specify it for the mapper only with:

mapreduce.map.java.opts=-Xmx7000m

I had similar problems and also logged the virtual machine heapsizes, more in this:
small blog post about checking java heap sizes

Note that also reducers are running on a node, so they might compete for memory, make sure to limit the number of reduce slots as well if necessary.

answered Aug 30 '13 at 07:49

DDW

1,975
2
13
26

where should i set this value? Like: conf.set("mapreduce.map.java.opts", "-Xmx7000m1")? btw, does this make sure a mapper only run one task a time? – Yukun Aug 30 '13 at 14:22
You need to set it in combination with your parameter, limiting the number of map slots in the tasktracker. (but standard Xmx is set to a much lower value than 7000m, usually below 1000m). You can set the value the same way you set the number of map slots, conf.set() is ok, or you can modify the mapred.site.xml file. – DDW Sep 02 '13 at 07:29

Hadoop one machine only run one task

1 Answers1