Given some table manipulation – create table with 2 rows and columns, add 3rd column and insert third row with 3 values
CREATE TABLE concat_test(
one string,
two string
)
STORED AS ORC;
INSERT INTO TABLE concat_test VALUES (1,1), (2,2);
ALTER TABLE concat_test ADD COLUMNS (three string);
INSERT INTO TABLE concat_test VALUES (3,3,3);
alter table concat_test concatenate;
I'm having an exception Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
when I try reading it with Spark
spark.sql("select * from concat_test").collect()
It is obviously connected with columns number. I'm further investigating problem in orc. I didn't find quick fix for such partitions nor the bug described elsewhere. Is there one?
Could anyone try this on the latest hadoop versions? Does the bug exist?
Hive 1.2.1, Spark 2.3.2
UPD. I myself fixed my tables via Hive. Hive queries do work after this manipulation so I created copy tables and did select-insert of the old data to them.