0

env

  • hive 3.1.2
  • tez 0.10.2
  • hadoop 3.2.1

trouble

I am using tez with hive llap.
Tez setup was carried out according to official documents.

LLAP works well, but Tez job continues to fail.
When running in container mode, the same query runs well.
The job continues to fail when the job is in progress and transitions to the reduce job.

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .........       llap       RUNNING     59         56        0        3     134       8
Reducer 2 ....        llap       RUNNING     51         36        0       15       0     124
----------------------------------------------------------------------------------------------
VERTICES: 00/02  [=====================>>-----] 83%   ELAPSED TIME: 12.35 s
----------------------------------------------------------------------------------------------

OOM doesn't happen.
When checking the error log, the following errors occur repeatedly when entering the reduction operation.
Has anyone experienced the same error?

2021-09-15T10:46:19,515 ERROR [TezTR-470960_251_1_1_32_0 (1631527470960_0251_1_01_000032_0)] tez.TezProcessor: java.lang.InterruptedException
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
    at org.apache.tez.runtime.InputReadyTracker$InputReadyMonitor.awaitCondition(InputReadyTracker.java:147)
    at org.apache.tez.runtime.InputReadyTracker.waitForAllInputsReady(InputReadyTracker.java:107)
    at org.apache.tez.runtime.api.impl.TezProcessorContextImpl.waitForAllInputsReady(TezProcessorContextImpl.java:138)
    at org.apache.tez.runtime.api.impl.TezProcessorContextImpl.waitForAllInputsReady(TezProcessorContextImpl.java:133)
    at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:122)
    at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
    at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
    at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
    at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
    at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
    at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
    at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
    at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Tez 0.10.1 has specific issues related to jetty and snappy in my case, so i set up to 0.10.2 version currently posted on github.

hoon
  • 43
  • 1
  • 9
  • Try to found FAILED map attempt log, not KILLED one. The one you provided is KILLED. After single container failed 3 times, it finally fails, all other attempts has been killed and the whole task has failed. Dig in the job tracker and find FAILED attempt logs – leftjoin Sep 15 '21 at 08:15
  • There is issue with Hive 3.1.2 with Tez 0.10.1 https://issues.apache.org/jira/browse/HIVE-23190 – hoon Sep 16 '21 at 07:10

0 Answers0