While running one Hive Activity
using AWS Data Pipeline
, my Hive activity is failing with following error:
Diagnostics: Container [pid=,containerID=] is running beyond physical memory limits.
Current usage: 1.0 GB of 1 GB physical memory used;
2.8 GB of 5 GB virtual memory used. Killing container.
When I ran Hive script which was getting executed by Hive Activity manually, I had to execute it as shown below:
hive \
-hiveconf tez.am.resource.memory.mb=16000 \
-hiveconf mapreduce.map.memory.mb=10240 \
-hiveconf mapreduce.map.java.opts=-Xmx8192m \
-hiveconf mapreduce.reduce.memory.mb=10240 \
-hiveconf mapreduce.reduce.java.opts=-Xmx8192m \
-hiveconf hive.exec.parallel=true
-f <hive script file path.>
With these settings Hive script executes perfectly.
Now question is how do I pass these settings to Hive Activity of AWS data pipeline? I can't seem to find any way to pass -hiveconf
to Hive activity.