2

While running one Hive Activity using AWS Data Pipeline, my Hive activity is failing with following error:

Diagnostics: Container [pid=,containerID=] is running beyond physical memory limits. 
Current usage: 1.0 GB of 1 GB physical memory used;
2.8 GB of 5 GB virtual memory used. Killing container. 

When I ran Hive script which was getting executed by Hive Activity manually, I had to execute it as shown below:

hive \
-hiveconf tez.am.resource.memory.mb=16000 \
-hiveconf mapreduce.map.memory.mb=10240 \
-hiveconf mapreduce.map.java.opts=-Xmx8192m \
-hiveconf mapreduce.reduce.memory.mb=10240 \
-hiveconf mapreduce.reduce.java.opts=-Xmx8192m \
-hiveconf hive.exec.parallel=true
-f <hive script file path.>

With these settings Hive script executes perfectly.

Now question is how do I pass these settings to Hive Activity of AWS data pipeline? I can't seem to find any way to pass -hiveconf to Hive activity.

Shekhar
  • 11,438
  • 36
  • 130
  • 186

1 Answers1

2

How are you calling your hive script within DataPipeline ? If you use ShellCommandActivity you should be able to pass these -hiveconf as you would do on a command line and it should run fine.

Krish
  • 390
  • 4
  • 15
  • ShellCommandActivity can be used but I wanted to know if there is any way to achieve this same thing using HiveActivity itself. – Shekhar Aug 25 '17 at 15:26
  • 1
    I know its little late but you should be able to get that done using "SCRIPT VARIABLE" option in data pipeline and there are https://stackoverflow.com/questions/23415400/how-to-use-scriptvariables-in-hive-aws-data-pipeline – Krish Oct 04 '17 at 15:26
  • @DhairyaVerma , we couldn't find any solution to this problem back in 2017 when we faced this particular issue. So we ended up using ShellCommandActivity only. Perhaps now in 2020 AWS has some solution to this problem. – Shekhar Nov 09 '20 at 10:48