0

I am creating and writing to a hive managed table using Spark(Spark job is completing without error). However, querying the same table from Hive beeline console is throwing

java.io.IOException: java.lang.IllegalArgumentException: bucketId out of range: -1 (state=,code=0)`

I also printed out the values of the DataSet<Row>, which is proper.

Below is the spark Code:

SparkSession spark = SparkSession.builder().appName("Spark_Hive").enableHiveSupport().getOrCreate();
HiveContext hiveContext = new HiveContext(sparkContext);
hiveContext.setConf("hive.exec.dynamic.partition", "true");
hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict");

spark.sql("CREATE TABLE IF NOT EXISTS table1" + " (num1 INT, num2 INT) STORED AS ORC");

String sql = "with tab1 as(select inline(array(" + "struct(1),struct(2),struct(3),struct(4),struct(5),struct(6)"
        + ")) as num1), tab2 as (select inline(array(" + "struct(1),struct(2),struct(3),struct(4),struct(5),struct(6)"
        + ")) as num2) select * from tab1 t1 join tab2 t2 on t1.num1=t2.num2";

Dataset<Row> df = hiveContext.sql(sql);
JavaRDD<Row> res = df.toJavaRDD();
StructType schema = new StructType().add("num1", "int").add("num2", "int");
Dataset<Row> result = hiveContext.createDataFrame(res, schema);
result.write().format("ORC").mode(SaveMode.Append).insertInto("table_" + tableSuffix);
result.unpersist();

When running select query from hive beeline, I am getting:

0: jdbc:hive2://host> select * from table1; 
INFO  : Compiling command(queryId=hive_20210120020726_8814e02b-d413-4ec8-a9ea-9009734b5618): select * from table1 
INFO  : Semantic Analysis Completed (retrial = false) 
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:table1.num1, type:int, comment:null), FieldSchema(name:table1.num2, type:int, comment:null)], properties:null) 
INFO  : Completed compiling command(queryId=hive_20210120020726_8814e02b-d413-4ec8-a9ea-9009734b5618); Time taken: 0.11 seconds 
INFO  : Executing command(queryId=hive_20210120020726_8814e02b-d413-4ec8-a9ea-9009734b5618): select * from table_1  
INFO  : Completed executing command(queryId=hive_20210120020726_8814e02b-d413-4ec8-a9ea-9009734b5618); Time taken: 0.003 seconds 
INFO  : OK Error: java.io.IOException: java.lang.IllegalArgumentException: bucketId out of range: -1 (state=,code=0)

Please help in resolving this issue.

EDIT: Tried the solutions mentioned in below Posts, None Worked.

spark throws error when reading hive table

Table loaded through Spark not accessible in Hive

  • Does this answer your question? [Table loaded through Spark not accessible in Hive](https://stackoverflow.com/questions/52761391/table-loaded-through-spark-not-accessible-in-hive) – blackbishop Jan 19 '21 at 21:58
  • @blackbishop Tried solution mentioned in the mentioned question. Didn't work. – Souradeep Biswas Jan 20 '21 at 04:01

1 Answers1

0

Hive state that is executing insert into "table_" + tableSuffix. Mind the underscore in table name.

=> Thus, irrelevant with your input : select * from table1

william xyz
  • 710
  • 5
  • 19
Virgile
  • 1
  • 2