1

I am facing this error when perform partitioning by date on a hive table that have more 70 columns :

ERROR : Status: Failed ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1612203694878_0265_4_00, diagnostics=[Task failed, taskId=task_1612203694878_0265_4_00_000058, diagnostics=[TaskAttempt 0 failed, info=[Container container_e16_1612203694878_0265_01_000167 finished with diagnostics set to [Container failed, exitCode=-104. [2021-02-02 11:00:58.498]Container [pid=1577,containerID=container_e16_1612203694878_0265_01_000167] is running 3022848B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.7 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_e16_1612203694878_0265_01_000167 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 1577 1567 1577 1577 (bash) 0 0 116011008 301 /bin/bash -c /usr/jdk64/jdk1.8.0_112/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhdp.version=3.1.4.0-315 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -server -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/usr/hadoop/yarn/log/application_1612203694878_0265/container_e16_1612203694878_0265_01_000167 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/usr/hadoop/yarn/local/usercache/hive/appcache/application_1612203694878_0265/container_e16_1612203694878_0265_01_000167/tmp org.apache.tez.runtime.task.TezChild slave-06-n.fawryhq.corp 43250 container_e16_1612203694878_0265_01_000167 application_1612203694878_0265 1 1>/usr/hadoop/yarn/log/application_1612203694878_0265/container_e16_1612203694878_0265_01_000167/stdout 2>/usr/hadoop/yarn/log/application_1612203694878_0265/container_e16_1612203694878_0265_01_000167/stderr |- 1658 1577 1577 1577 (java) 1414 128 2788896768 262581 /usr/jdk64/jdk1.8.0_112/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhdp.version=3.1.4.0-315 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -server -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/usr/hadoop/yarn/log/application_1612203694878_0265/container_e16_1612203694878_0265_01_000167 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/usr/hadoop/yarn/local/usercache/hive/appcache/application_1612203694878_0265/container_e16_1612203694878_0265_01_000167/tmp org.apache.tez.runtime.task.TezChild slave-06-n.fawryhq.corp 43250 container_e16_1612203694878_0265_01_000167 application_1612203694878_0265 1 [2021-02-02 11:00:58.512]Container killed on request. Exit code is 143 [2021-02-02 11:00:58.521]Container exited with a non-zero exit code 143. ]], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.allocateSpace(PipelinedSorter.java:256) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.(PipelinedSorter.java:205) at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:146) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:193) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) , errorMessage=Cannot recover from this error:java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.allocateSpace(PipelinedSorter.java:256) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.(PipelinedSorter.java:205) at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:146) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:193) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:17, Vertex vertex_1612203694878_0265_4_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE] ERROR : Vertex killed, vertexName=Reducer 2, vertexId=vertex_1612203694878_0265_4_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:2, Vertex vertex_1612203694878_0265_4_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE] ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1

A.james
  • 27
  • 4

2 Answers2

0

Try (in this order)

  1. Increase mapper parallelism. The goal is to get more smaller mappers. Check how many mappers does it start and adjust figures. This will not work if you have too big files in non-splittable format like gzip, proceed with two next steps then.

    --This is example, check your current setings and adjust to get x2 or more mappers
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    set tez.grouping.max-size=32000000; --bigger files will be splitted
    set tez.grouping.min-size=32000;    --smaller files will be combined on single mapper
    
    
  2. Disable map-side aggregation (map-side aggregation often leads to OOM)

    set hive.map.aggr=false;
    
    
  3. Increase mapper memory if two steps above do not help (try to find minimum working container size)

    set hive.tez.container.size=9216; --Adjust figures and chose minimum working size
    set hive.tez.java.opts=-Xmx6144m;
    
    
leftjoin
  • 36,950
  • 8
  • 57
  • 116
0

When You work with Have on Tez, you have to define at least all of these 4 parameters, for example:

set hive.tez.container.size=8192;
set tez.am.resource.memory=8192;
set tez.runtime.io.sort.mb=2048;
set hive.tez.java.opts=-Xmx6144m;
set tez.am.launch.cmd-opts=-Xmx4096m;