I am creating and writing to a hive managed table using Spark(Spark job is completing without error). However, querying the same table from Hive beeline console is throwing
java.io.IOException: java.lang.IllegalArgumentException: bucketId out of range: -1 (state=,code=0)`
I also printed out the values of the DataSet<Row>
, which is proper.
Below is the spark Code:
SparkSession spark = SparkSession.builder().appName("Spark_Hive").enableHiveSupport().getOrCreate();
HiveContext hiveContext = new HiveContext(sparkContext);
hiveContext.setConf("hive.exec.dynamic.partition", "true");
hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict");
spark.sql("CREATE TABLE IF NOT EXISTS table1" + " (num1 INT, num2 INT) STORED AS ORC");
String sql = "with tab1 as(select inline(array(" + "struct(1),struct(2),struct(3),struct(4),struct(5),struct(6)"
+ ")) as num1), tab2 as (select inline(array(" + "struct(1),struct(2),struct(3),struct(4),struct(5),struct(6)"
+ ")) as num2) select * from tab1 t1 join tab2 t2 on t1.num1=t2.num2";
Dataset<Row> df = hiveContext.sql(sql);
JavaRDD<Row> res = df.toJavaRDD();
StructType schema = new StructType().add("num1", "int").add("num2", "int");
Dataset<Row> result = hiveContext.createDataFrame(res, schema);
result.write().format("ORC").mode(SaveMode.Append).insertInto("table_" + tableSuffix);
result.unpersist();
When running select query from hive beeline, I am getting:
0: jdbc:hive2://host> select * from table1;
INFO : Compiling command(queryId=hive_20210120020726_8814e02b-d413-4ec8-a9ea-9009734b5618): select * from table1
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:table1.num1, type:int, comment:null), FieldSchema(name:table1.num2, type:int, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20210120020726_8814e02b-d413-4ec8-a9ea-9009734b5618); Time taken: 0.11 seconds
INFO : Executing command(queryId=hive_20210120020726_8814e02b-d413-4ec8-a9ea-9009734b5618): select * from table_1
INFO : Completed executing command(queryId=hive_20210120020726_8814e02b-d413-4ec8-a9ea-9009734b5618); Time taken: 0.003 seconds
INFO : OK Error: java.io.IOException: java.lang.IllegalArgumentException: bucketId out of range: -1 (state=,code=0)
Please help in resolving this issue.
EDIT: Tried the solutions mentioned in below Posts, None Worked.