1

I'm using Hive version 3.1.3 on Hadoop 3.3.4 with Tez 0.9.2. I'm trying to run a SELECT statement on table that Hive created and manages. The query never finishes and fails. The full error message is below, but this appears to be the relevant portion:

Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector

It looks like the error is a long to decimal conversion issue. However, this table was created by Hive, loading/transforming data in a previous step. Wouldn't Hive have thrown an error earlier if it was inserting an invalid value into a decimal column?

I used the exact same codebase and the exact same data on AWS EMR and didn't get this error, so I don't think there's an invalid value. But I'm stuck on where to go from here.

Here's the table definition:

claimid             varchar(50)
claimlineid         int
dos                 date
dosto               date
member              varchar(50)
provider            varchar(50)
setname             varchar(255)
code                varchar(50)
system              varchar(255)
primary             int
positivenegative    int
result              decimal(10,2)
supply              int
size                decimal(10,2)
quantity            decimal(10,2)

And here's the full error message:

Vertex failed, vertexName=Map 1, vertexId=vertex_1667735849290_0030_32_15, diagnostics=[Task failed, taskId=task_1667735849290_0030_32_15_000009, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1667735849290_0030_32_15_000009_0:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
        at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
        at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
        at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:488)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
        ... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException:
DeserializeRead detail: Reading byte[] of length 4096 at start offset 4 for length 100 to read 14 fields with types [varchar(50), int, date, date, varchar(50), varchar(50), varchar(255), varchar(50), varchar(255), int, decimal(10,2), int, decimal(10,2), decimal(10,2)].  Read field #14 at field start position 0 current read offset 104
        at org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:611)
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.closeOp(VectorMapJoinGenerateResultOperator.java:681)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:757)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:477)
        ... 17 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException:
DeserializeRead detail: Reading byte[] of length 4096 at start offset 4 for length 100 to read 14 fields with types [varchar(50), int, date, date, varchar(50), varchar(50), varchar(255), varchar(50), varchar(255), int, decimal(10,2), int, decimal(10,2), decimal(10,2)].  Read field #14 at field start position 0 current read offset 104
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.reProcessBigTable(VectorMapJoinGenerateResultOperator.java:609)
        at org.apache.hadoop.hive.ql.exec.MapJoinOperator.continueProcess(MapJoinOperator.java:671)
        at org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:604)
        ... 21 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
DeserializeRead detail: Reading byte[] of length 4096 at start offset 4 for length 100 to read 14 fields with types [varchar(50), int, date, date, varchar(50), varchar(50), varchar(255), varchar(50), varchar(255), int, decimal(10,2), int, decimal(10,2), decimal(10,2)].  Read field #14 at field start position 0 current read offset 104
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.reProcessBigTable(VectorMapJoinGenerateResultOperator.java:589)
        ... 23 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector
        at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storePrimitiveRowColumn(VectorDeserializeRow.java:687)
        at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(VectorDeserializeRow.java:934)
        at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(VectorDeserializeRow.java:1360)
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.reProcessBigTable(VectorMapJoinGenerateResultOperator.java:585)
        ... 23 more
Patrick Tucci
  • 1,824
  • 1
  • 16
  • 22
  • 1
    Hive uses "schema on read", so no, the error wouldn't happen when writing data. – OneCricketeer Nov 07 '22 at 16:46
  • According to the error, it wants your 14th field to be a DOUBLE or FLOAT, not DECIMAL (perhaps when you wrote it, it was truncated to remove the decimal) – OneCricketeer Nov 07 '22 at 16:48
  • 1
    [HIVE-23909](https://issues.apache.org/jira/browse/HIVE-23909)? Can you disable vectorization (set hive.vectorized.execution.enabled=false) and try? – mazaneicha Nov 07 '22 at 17:54
  • 1
    @mazaneicha thanks, I'll give that a try. I've shifted focus to migrating our workload to Spark since this is only the first of many insurmountable issues I've had with Hive/Hadoop/Tez. But I'll give that a shot once my Spark testing is complete. – Patrick Tucci Nov 07 '22 at 18:51
  • Absolutely, just use Spark! No reason to suffer PITA of Hive+Tez if Spark is an option. – mazaneicha Nov 07 '22 at 19:00
  • @mazaneicha yeah it's a shame, I used Hive for a few years back in 2012 at a large insurance company, and it was great. It's really sad how much of a second-class citizen Hive is now. – Patrick Tucci Nov 08 '22 at 21:09

1 Answers1

0

Unfortunately, this is a problem with CBO. You can disable it, run the expression and get the result. set hive.cbo.enable=false;