I've used Spark 3.3.1, configured with delta-core_2.12.2.2.0
and delta-storage-2.2.0
, to create several tables within an external database.
spark.sql("create database if not exists {database}.{table} location {path_to_storage}")
Within that database I've got several delta tables, created and populated through Spark, e.g.:
{table_df}.write.format("delta").mode("overwrite").saveAsTable("{database}.{table}")
I can then, right away, address it so:
df = spark.sql("select * from {database}.{table} limit 10")
df.show()
And everything works fine.
When I try to run the same command (select * from {database}.{table} limit 10;
) through hive, or dbeaver sql editor, I get the following error:
hive> select * from {database}.{table} limit 10;
2023-01-04T12:45:29,474 INFO [main] org.apache.hadoop.hive.conf.HiveConf - Using the default value passed in for log id: 9ecfc0dd-0606-4060-98ef-b1395fc62456
2023-01-04T12:45:29,484 INFO [main] org.apache.hadoop.hive.ql.session.SessionState - Updating thread name to 9ecfc0dd-0606-4060-98ef-b1395fc62456 main
2023-01-04T12:45:31,138 INFO [9ecfc0dd-0606-4060-98ef-b1395fc62456 main] org.apache.hadoop.hive.common.FileUtils - Creating directory if it doesn't exist: hdfs://localhost:9000/tmp/hive/user/9ecfc0dd-0606-4060-98ef-b1395fc62456/hive_2023-01-04_12-45-29_879_2613740326746479876-1/-mr-10001/.hive-staging_hive_2023-01-04_12-45-29_879_2613740326746479876-1
OK
Failed with exception java.io.IOException:java.io.IOException: file:/{path_to_file_storage}/part-00000-7708a52c-0939-4288-b56a-ecdeea197574-c000.snappy.parquet not a SequenceFile
Time taken: 1.649 seconds
2023-01-04T12:45:31,298 INFO [9ecfc0dd-0606-4060-98ef-b1395fc62456 main] org.apache.hadoop.hive.conf.HiveConf - Using the default value passed in for log id: 9ecfc0dd-0606-4060-98ef-b1395fc62456
2023-01-04T12:45:31,298 INFO [9ecfc0dd-0606-4060-98ef-b1395fc62456 main] org.apache.hadoop.hive.ql.session.SessionState - Resetting thread name to main
hive>
I have installed the delta connectors (delta-hive-assembly_2.12-0.6.0.jar) from here:
https://github.com/delta-io/connectors/blob/master/hive/README.md
Installed it in an auxjar folder in my main hive directory and added the following properties in my hive-site.xml file:
<property>
<name>hive.input.format</name>
<value>io.delta.hive.HiveInputFormat</value>
</property>
<property>
<name>hive.tez.input.format</name>
<value>io.delta.hive.HiveInputFormat</value>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>file:/{path_to_file}/auxjar/delta-hive-assembly_2.12-0.6.0.jar</value>
</property>
When I start hive I'm not seeing an exceptions about the file not being found. Have I missed a critical step out?
Thanks
Tried running a simple query in hive, got an IOException.