5

By default, Cloud Dataproc runs a Hive Metastore local to the Dataproc cluster. This means:

  1. The metastore is ephemeral with the cluster

  2. It can be a pain to have multiple clusters using a single metastore

Is it possible to point Dataproc clusters to a single Hive metastore? Is it also possible to have the metastore live outside the cluster so running a cluster for a metastore is not required?

James
  • 2,321
  • 14
  • 30

1 Answers1

2

Yes, this is possible - clusters can use a common metastore running on Cloud SQL.

Cloud Dataproc clusters can use this Cloud SQL Proxy to connect to the external SQL metastore. Before using this solution, you should review the important notes.

James
  • 2,321
  • 14
  • 30