I have:
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.tez.container.size=8192;
set tez.runtime.io.sort.mb=3276;
set tez.runtime.unordered.output.buffer.size-mb=819;
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=1610612736;
...to purposefully trigger OOM Java heap space issue. But the error:
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.serde2.WriteBuffers.nextBufferToWrite(WriteBuffers.java:261)
at org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:237)
at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$LazyBinaryKvWriter.writeValue(MapJoinBytesTableContainer.java:333)
at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.writeFirstValueRecord(BytesBytesMultiHashMap.java:896)
at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:440)
at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:450)
at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:242)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:388)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:457)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.lambda$initializeOp$0(MapJoinOperator.java:241)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator$$Lambda$75/510718713.call(Unknown Source)
at org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:96)
at org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:113)
... 4 more
, errorMessage=Cannot recover from this error:java.lang.RuntimeException: Reducer 3 operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:265)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:270)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:253)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Shows that the MapJoinOperator
was running on a Reducer. I thought Map Join runs on Mappers, and all Mappers have completed.
Why does a Map Join Operator run on a Reducer?