I am using hadoop 2.4. The reducer use several large memory mapped files (about 8G total). The reducer itself uses very little memory. To my knowledge, the memeory mapped file (FileChannel.map(readonly)
) also uses little memory (managed by OS instead of JVM).
I got this error:
Container [pid=26783,containerID=container_1389136889967_0009_01_000002]
is running beyond physical memory limits.
Current usage: 4.2 GB of 4 GB physical memory used;
5.2 GB of 8.4 GB virtual memory used. Killing container
Here was my settings:
mapreduce.reduce.java.opts=-Xmx2048m
mapreduce.reduce.memory.mb=4096
So I adjust the parameter to this and works:
mapreduce.reduce.java.opts=-Xmx10240m
mapreduce.reduce.memory.mb=12288
I further adjust the parameters and get it work like this:
mapreduce.reduce.java.opts=-Xmx2048m
mapreduce.reduce.memory.mb=10240
My question is: why I need the yarn container to have about 8G more memory than the JVM size? The culprit seems to be the large Java memory mapped files I used (each about 1.5G, sum up to about 8G). Isn't the memory mapped files managed by the OS and they supposed to be sharable by multiple processes (e.g. reducers)?
I use AWS m2.4xlarge instance (67G memory) and it has about 8G unused and the OS should have sufficient memory. In current settings, there are only about 5 reducers available for each instance, and each reducer has extra 8G memory. This just looks very stupid.