0

I'm importing spark table to hive:

df.createOrReplaceTempView(table_name);
df = spark.sql("SELECT * FROM "+ table_name);

df.write().format("orc").mode("overwrite").saveAsTable(db_name+"."+table_name);

table is creating successfully but when I execute select query from hive on the table I'm getting following error and in result getting empty columns

Error: java.io.IOException: java.lang.IllegalArgumentException: bucketId out of range: -1 (state=,code=0)

I've read from here Table loaded through Spark not accessible in Hive and I've tried using the hive-warehouse-connector following this link https://docs.cloudera.com/runtime/7.2.2/integrating-hive-and-bi/topics/hive-etl-example.html but still I'm getting the same error while opening table from hive.

I've tried a lot but I can't figured out why I get this error. Can anyone explain me what's the problem or if is there a way to avoid this bucketing that throws me the error?

I'm using spark 2.3.1 and Hive 3 (and I've tried with hive-warehouse-connector_2.11-1.0.0.7.1.4.0-203 for the second attempt)

Any help is appreciate!

UPDATE

Using hive-warehouse-connector (hwc) I'm able to write but only if the table is already created..

But reading at the doc they say that it will automatically create the table. In my case this is not happening. I've tried all saveMode ("overwrite","append"..ecc)

usage:

HiveWarehouseSession hive = HiveWarehouseSession
                .session(spark)
                .userPassword(username, password)
                .build();
hive.setDatabase(db_name);

df.write()
          .format(HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR)
          .option("table", table_name)
          .mode("overwrite")
          .save();
Salvatore Nedia
  • 302
  • 2
  • 15

1 Answers1

0

maybe you can try to useSTORED AS textfile when you create your table

JoanJiao
  • 1
  • 3