Questions tagged [google-cloud-dataproc-metastore]

11 questions
4
votes
1 answer

Function of Dataproc Metastore in a Datalake environment

In a Google Datalake environment, what is the Dataproc Metastore service used for? I'm watching a Google Cloud Tech video and in this video around the 17:33 mark, the presenter says: The other thing that is required, in order to make the data…
3
votes
1 answer

How can we interact with Dataproc Metastore to fetch list of databases and tables?

I am using Dataproc metastore as a Metastore service with GCP. How can I interact with it to fetch list of databases and tables from it? Is it possible to do this without running dataproc cluster ? Edit - I have to fetch the metadata without running…
2
votes
0 answers

Is there any way to get the error code and error message directly from Dataproc API

We are currently creating Dataproc clusters using below sample code, from google.cloud import dataproc_v1 def sample_create_cluster(): # Create a client client = dataproc_v1.ClusterControllerClient() # Initialize request argument(s) …
2
votes
1 answer

Unable to find DataprocCreateClusterOperator configs : Dataproc Metastore

Have been looking for cluster configs in JSON format to create a dataproc cluster(GCE) with Dataproc Metastore service and Spark-BQ dependency jars, unable to find any reference document that specifies how to use those JSON configs. I have looked…
2
votes
1 answer

Special characters in Dataproc partition columns

I'm using Spark 3.1.2 on Goole Dataproc image version 2.0.15-debian10 with Dataproc managed metastore version 3.1.2. The following snippet works fine with a GCS backed table mydb.mytable: from pyspark.sql import…
1
vote
1 answer

Iceberg - MERGE INTO TABLE is not supported temporarily

I tried to merge data from parquet file into java.lang.UnsupportedOperationException: MERGE INTO TABLE is not supported temporarily. I use spark 3.3.0 with iceberg 1.1.0 running on a dataproc cluster which already attached to a dataproc metastore…
1
vote
0 answers

Apache Iceberg on GCS atomic rename

I have a spark on dataproc serverless use case which requires to read/write with iceberg format on GCS. Reading through documentation I realized that I cannot use hadoop table catalog because GCS does not support atomic rename: A Hadoop catalog…
0
votes
1 answer

does google provide techincal support for dataproc's optional components ex. Ranger?

does google provide techincal support for dataproc's optional components ex. Ranger? if yes, can someone leave a link to verify?
0
votes
0 answers

can we an external table in Trino with an external location in Amazon S3 or Azure blob storage using dataproc metastore

Had set up trino cluster in GKE, using hive connector with dataproc metastore service to create external tables with external location in GCS cloud storage. But when I tried to create external tables in Azure Blob Storage with an external location,…
0
votes
2 answers

Connecting BigQuery to Dataproc Metastore/hive tables

Is it possible to connect bigquery to hive/dataproce metastore database? I don't want to load hive tables(orc or parquet) into bigquery internal storage. If bigquery can route its sql to hive and then hive runs query on spark that works. I…
nir
  • 3,743
  • 4
  • 39
  • 63
0
votes
1 answer

Couldn't connect to dpms while creating dataproc using airflow operator

I have a service created for dataproc metastore(in same project as composer's) and trying to use it instead of my hive warehouse. I could successfully run this using gcloud commands but when I am trying to use any airflow operators…