0

Is it possible to connect bigquery to hive/dataproce metastore database? I don't want to load hive tables(orc or parquet) into bigquery internal storage. If bigquery can route its sql to hive and then hive runs query on spark that works. I considered using Hive CLI instead of bigquery to execute queries but being able to do it via bigquery will allow unified interface to execute ad-hoc sqls. I also considered external tables in big query which can directly points to raw parquet/orc locations. However orc tables are also ACID tables managed by hive so bigquery directly accessing raw Orc dataset may result in inconsistent reads.

nir
  • 3,743
  • 4
  • 39
  • 63

2 Answers2

1

It is possible to connect Hive/Dataproc to BigQuery or vice versa by using Spark BigQuery Connector. Take note that SparkSQL supports Hive and not BigQuery even though BigQuery reads and writes using spark-bigquery-connector.

Poala Astrid
  • 1,028
  • 2
  • 10
  • it seems `Spark BigQuery Connector` is for spar/dataproc to read and possibly write to bigquery but not other way around. – nir Apr 27 '23 at 17:36
0

I was able to achieve this by using Biglake metastore catalog. I stumble upon this document as I was also looking to expose apache iceberg external tables to bigquery. It seems you can use same catalog to export Hive (or dataproc metastore) tables to bigquery as well.

nir
  • 3,743
  • 4
  • 39
  • 63