0

I've used Spark 3.3.1, configured with delta-core_2.12.2.2.0 and delta-storage-2.2.0, to create several tables within an external database.

spark.sql("create database if not exists {database}.{table} location {path_to_storage}")

Within that database I've got several delta tables, created and populated through Spark, e.g.:

{table_df}.write.format("delta").mode("overwrite").saveAsTable("{database}.{table}")

I can then, right away, address it so:

df = spark.sql("select * from {database}.{table} limit 10")
df.show()

And everything works fine.

When I try to run the same command (select * from {database}.{table} limit 10;) through hive, or dbeaver sql editor, I get the following error:

hive> select * from {database}.{table} limit 10;
2023-01-04T12:45:29,474 INFO [main] org.apache.hadoop.hive.conf.HiveConf - Using the default value passed in for log id: 9ecfc0dd-0606-4060-98ef-b1395fc62456
2023-01-04T12:45:29,484 INFO [main] org.apache.hadoop.hive.ql.session.SessionState - Updating thread name to 9ecfc0dd-0606-4060-98ef-b1395fc62456 main
2023-01-04T12:45:31,138 INFO [9ecfc0dd-0606-4060-98ef-b1395fc62456 main] org.apache.hadoop.hive.common.FileUtils - Creating directory if it doesn't exist: hdfs://localhost:9000/tmp/hive/user/9ecfc0dd-0606-4060-98ef-b1395fc62456/hive_2023-01-04_12-45-29_879_2613740326746479876-1/-mr-10001/.hive-staging_hive_2023-01-04_12-45-29_879_2613740326746479876-1
OK
Failed with exception java.io.IOException:java.io.IOException: file:/{path_to_file_storage}/part-00000-7708a52c-0939-4288-b56a-ecdeea197574-c000.snappy.parquet not a SequenceFile
Time taken: 1.649 seconds
2023-01-04T12:45:31,298 INFO [9ecfc0dd-0606-4060-98ef-b1395fc62456 main] org.apache.hadoop.hive.conf.HiveConf - Using the default value passed in for log id: 9ecfc0dd-0606-4060-98ef-b1395fc62456
2023-01-04T12:45:31,298 INFO [9ecfc0dd-0606-4060-98ef-b1395fc62456 main] org.apache.hadoop.hive.ql.session.SessionState - Resetting thread name to  main
hive>

I have installed the delta connectors (delta-hive-assembly_2.12-0.6.0.jar) from here:

https://github.com/delta-io/connectors/blob/master/hive/README.md

Installed it in an auxjar folder in my main hive directory and added the following properties in my hive-site.xml file:

<property>
    <name>hive.input.format</name>
    <value>io.delta.hive.HiveInputFormat</value>
</property>
<property>
    <name>hive.tez.input.format</name>
    <value>io.delta.hive.HiveInputFormat</value>
</property>
<property>
    <name>hive.aux.jars.path</name>
    <value>file:/{path_to_file}/auxjar/delta-hive-assembly_2.12-0.6.0.jar</value>
</property>

When I start hive I'm not seeing an exceptions about the file not being found. Have I missed a critical step out?

Thanks

Tried running a simple query in hive, got an IOException.

Oli
  • 9,766
  • 5
  • 25
  • 46
Joe Ingle
  • 11
  • 2
  • Further detail on the connector answers my question, I think: – Joe Ingle Jan 04 '23 at 17:59
  • If a table in the Hive Metastore is created by other systems such as Apache Spark or Presto, can I use this connector to query it in Hive? No. If a table in the Hive Metastore is created by other systems such as Apache Spark or Presto, Hive cannot find the correct connector to read it. You can follow our instruction to create a new table with a different table name but point to the same path in Hive. Although it's a different table name, the underlying data will be shared by all of systems. We recommend to create different tables in different systems but point to the same path. – Joe Ingle Jan 04 '23 at 18:01
  • yes, I was looking at same thing, you will have not define new table in hive. But I guess these should not be any issue if you point hive table to same data – Raid Jan 04 '23 at 18:41
  • Seems a bit naff though, doesn't it? I thought one of the main points was that the format was supplier agnostic? – Joe Ingle Jan 04 '23 at 19:04
  • yes, I guess https://github.com/delta-io/delta/issues/85 is trying to solve similar thing and https://github.com/delta-io/delta/issues/1045 – Raid Jan 04 '23 at 19:15

1 Answers1

0

Hive Connector

See FAQs at base of page:

If a table in the Hive Metastore is created by other systems such as Apache Spark or Presto, can I use this connector to query it in Hive?

No. If a table in the Hive Metastore is created by other systems such as Apache Spark or Presto, Hive cannot find the correct connector to read it. You can follow our instruction to create a new table with a different table name but point to the same path in Hive. Although it's a different table name, the underlying data will be shared by all of systems. We recommend to create different tables in different systems but point to the same path.

Joe Ingle
  • 11
  • 2