Hive table created through Spark (pyspark) are not accessible from Hive.
df.write.format("orc").mode("overwrite").saveAsTable("db.table")
Error while accessing from Hive:
Error: java.io.IOException: java.lang.IllegalArgumentException: bucketId out of range: -1 (state=,code=0)
Table getting created successfully in Hive and able to read this table back in spark. Table metadata is accessible (in Hive) and data file in table (in hdfs) directory.
TBLPROPERTIES of Hive table are :
'bucketing_version'='2',
'spark.sql.create.version'='2.3.1.3.0.0.0-1634',
'spark.sql.sources.provider'='orc',
'spark.sql.sources.schema.numParts'='1',
I also tried creating table with other workarounds but getting error while creating table:
df.write.mode("overwrite").saveAsTable("db.table")
OR
df.createOrReplaceTempView("dfTable")
spark.sql("CREATE TABLE db.table AS SELECT * FROM dfTable")
Error :
AnalysisException: u'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Table default.src failed strict managed table checks due to the following reason: Table is marked as a managed table but is not transactional.);'
Stack version details:
Spark2.3
Hive3.1
Hortonworks Data Platform HDP3.0