3

Could you please guide me to resolve this GC overhead and heap space error.

I am trying to insert partitioned table from another table (dynamic partition) using the below query:

INSERT OVERWRITE table tbl_part PARTITION(county)
SELECT  col1, col2.... col47, county FROM tbl;

I have ran the following parameters:

export  HADOOP_CLIENT_OPTS=" -Xmx2048m"
set hive.exec.dynamic.partition=true;  
set hive.exec.dynamic.partition.mode=nonstrict; 
SET hive.exec.max.dynamic.partitions=2048;
SET hive.exec.max.dynamic.partitions.pernode=256;
set mapreduce.map.memory.mb=2048;
set yarn.scheduler.minimum-allocation-mb=2048;
set hive.exec.max.created.files=250000;
set hive.vectorized.execution.enabled=true;
set hive.merge.smallfiles.avgsize=283115520;
set hive.merge.size.per.task=209715200;

Also added in yarn-site.xml :

<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for    containers</description>
</property>

<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>

Running free -m:

            total       used       free     shared    buffers     cached
Mem:         15347      11090       4256          0        174       6051
-/+ buffers/cache:       4864      10483
Swap:        15670         18      15652

Its a standalone cluster with 1 core. Preparing test data to run my unit test cases in spark.

Could you guide what else I could do.

The source table has the below properties:

Table Parameters:       
    COLUMN_STATS_ACCURATE   true                
    numFiles                13                  
    numRows                 10509065            
    rawDataSize             3718599422          
    totalSize               3729108487          
    transient_lastDdlTime   1470909228          

Thank you.

leftjoin
  • 36,950
  • 8
  • 57
  • 116
Aavik
  • 967
  • 19
  • 48

1 Answers1

2

Add DISTRIBUTE BY county to your query:

INSERT OVERWRITE table tbl_part PARTITION(county) SELECT  col1, col2.... col47, county FROM tbl DISTRIBUTE BY county;
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • 1
    I ran with DISTRIBUTE BY and got the heap space error: – Aavik Aug 14 '16 at 11:11
  • You didn't provided logs, so it's I'm using educated guess. Usually this helps. Have you tried to increase memory allocated? Maybe it really runs out of memory. See this: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-hive-out-of-memory-error-oom/ And this:https://blogs.msdn.microsoft.com/shanyu/2014/07/31/hadoop-yarn-memory-settings-in-hdinsight/ – leftjoin Aug 14 '16 at 15:30
  • set hive.vectorized.execution.enabled=true; set hive.vectorized.execution.reduce.enabled = true; set hive.vectorized.execution.reduce.groupby.enabled = true; set yarn.nodemanager.resource.memory-mb=8192; set yarn.scheduler.minimum-allocation-mb=2048; set yarn.scheduler.maximum-allocation-mb=8192; SET hive.tez.container.size=7168; SET hive.tez.java.opts=-Xmx4096m; – Aavik Aug 15 '16 at 08:59
  • RL: http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1464973979589_0172&tipid=task_1464973979589_0172_r_000013 ----- Diagnostic Messages for this Task: Error: Java heap space FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 13 Reduce: 15 Cumulative CPU: 1212.26 sec HDFS Read: 3729681833 HDFS Write: 408552289 FAIL Total MapReduce CPU Time Spent: 20 minutes 12 seconds 260 msec – Aavik Aug 15 '16 at 08:59
  • Should I increase "hive.tez.java.opts". The free memory is only 12GB: total used free shared buffers cached Mem: 15347 8041 7306 0 179 5213 -/+ buffers/cache: 2648 12698 Swap: 15670 15 15655 Please suggest. – Aavik Aug 15 '16 at 09:00
  • You can increase if necessary per query. Do not increase if not necessary. I have used 10Gb with 15 totally free. – leftjoin Aug 15 '16 at 13:04
  • BTW Are you loading ORC table? – leftjoin Aug 15 '16 at 13:05
  • It seems you are running MR, not TEZ, right? Then hive.tez.java.opts will have no effect on MR. Better to look at failed reducer log in job tracker to find the exact root cause of OOM exception. On what exactly does it fail. – leftjoin Aug 15 '16 at 13:12
  • Now, I tried reducing the data by specifying where clause for county and increasing the hive.tez.container.size=10240; and hive.tez.java.opts=-Xmx8192m; The job ran and failed with heap space when the reducer was 98%. Out of 15 reducer, 14 ran successfully. So, I tried setting value for mapreduce.reduce.java.opts = “-Xmx2048m”; and yarn.app.mapreduce.am.command-opts = “-Xmx2048m”;. This time the job did not failed as soon as it started. The above 2 parameter is not there in hive-site.xml Please suggest. Thank you. – Aavik Aug 15 '16 at 13:12
  • The target table is a parquet partitioned table. – Aavik Aug 15 '16 at 13:14
  • The hive.execution.engine is mr – Aavik Aug 15 '16 at 13:19
  • mapreduce.reduce.jav‌​a.opts = “-Xmx2048m”; and yarn.app.mapreduce.a‌​m.command-opts = “-Xmx2048m”; are quite normal settings for default values. I suggest to set them on query level. (before each query when necessary). – leftjoin Aug 15 '16 at 13:20
  • There is one more important setting affecting the memory required by reducer: set hive.exec.reducers.bytes.per.reducer=67108864; The smaller this figure the more reducers will start increasing the parallelism and reducing pressure on single reducer. - Try to decrease it and check the performance. Too many reducers is not good also because they will be waiting in queue. – leftjoin Aug 15 '16 at 13:24
  • Thank you so much!!! The hive.exec.reducers.bytes.per.reducer worked. The reducers ran serially instead of parallel and got completed in 6 mins 46 secs. Learned a new thing today. – Aavik Aug 15 '16 at 16:44