RMAppMaster is running beyond physical memory limits

Question

I am trying to troubleshoot this puzzling issue: RMAppMaster oversteps its allocated container memory and is then killed by the node manager even if heap size is much smaller than container size.

NM logs:

2017-12-01 11:18:49,863 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 14191 for container-id container_1506599288376_62101_01_000001: 1.0 GB of 1 GB physical memory used; 3.1 GB of 2.1 GB virtual memory used
2017-12-01 11:18:49,863 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree for container: container_1506599288376_62101_01_000001 has processes older than 1 iteration running over the configured limit. Limit=1073741824, current usage = 1076969472
2017-12-01 11:18:49,863 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=14191,containerID=container_1506599288376_62101_01_000001] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 3.1 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1506599288376_62101_01_000001 :
        |- 14279 14191 14191 14191 (java) 4915 235 3167825920 262632 /usr/java/default//bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1506599288376_62101/container_1506599288376_62101_01_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Djava.net.preferIPv4Stack=true -Xmx512m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 
        |- 14191 14189 14191 14191 (bash) 0 1 108650496 300 /bin/bash -c /usr/java/default//bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1506599288376_62101/container_1506599288376_62101_01_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA  -Djava.net.preferIPv4Stack=true -Xmx512m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/var/log/hadoop-yarn/container/application_1506599288376_62101/container_1506599288376_62101_01_000001/stdout 2>/var/log/hadoop-yarn/container/application_1506599288376_62101/container_1506599288376_62101_01_000001/stderr

You can observe that while the heap size is set to 512MB, physical memory observed by the NM grows up to 1GB.

Application is an Oozie launcher (Hive task), thus it has only one mapper which does mostly nothing and no reducer.

What baffles me is that only this specific instance of MRAppMaster is killed and I cannot explain the 500MB overhead between max heap size and physical memory as defined by the NM:

Other MRAppMaster instances run fine even with the default config (yarn.app.mapreduce.am.resource.mb = 1024 and yarn.app.mapreduce.am.command-opts = -Xmx825955249).
MRAppMaster does not run any application specific code, why only this one is having trouble? I expect MRAppMaster memory consumption to be somewhat linear to the number of tasks / attempts and this app has only one mapper.
-Xmx has been reduced to 512MB to see if the issue still happens with ~500MB of headroom. I expect MRAppMaster to consume very little native memory, what could those extra 500MB be?

I will try to workaround the issue by increasing yarn.app.mapreduce.am.resource.mb, but had really like to understand what is going on. Any idea?

config: cdh-5.4

_"CDH 5.4 ... MRAppMaster does not run any application specific code"_ > are you really sure? In early CDH 5 versions, by default Oozie would run the Launcher process _inside_ the AM container (i.e. YARN would report only 1 container for the whole Application). Don't know why it worked that way, nor how it was configured. And that weird behaviour has changed around CDH 5.9 or 10. — Samson Scharfrichter, Dec 02 '17 at 16:17
In case your Launcher uses off-heap memory you may want to set e.g. `-XX:MaxDirectMemorySize=100M` in the appropriate "java options" property, i.e. `oozie.launcher.yarn.app.mapreduce.am.command-opts` and/or `oozie.launcher.mapreduce.map.java.opts` — Samson Scharfrichter, Dec 02 '17 at 16:29
And to raise the memory quota for that specific Oozie Action, of course you can set `oozie.launcher.yarn.app.mapreduce.am.resource.mb` and/or `oozie.launcher.mapreduce.map.memory.mb` (then fine-tune the heap quota accordingly with the "java options") >>> in case you are not aware of "oozie.launcher" prefix, check https://stackoverflow.com/questions/24262896/oozie-shell-action-memory-limit/24262996#24262996 — Samson Scharfrichter, Dec 02 '17 at 16:41
Yup, pretty sure. App has two containers, the second one being a mapper with expected logs (spawns a new app, the Hive query, and waits until it complete). But I will double check that uber mode isn't enabled. — Clément MATHIEU, Dec 02 '17 at 16:43

RMAppMaster is running beyond physical memory limits

0 Answers0