Questions tagged [google-cloud-dataproc-serverless]

25 questions
1
vote
1 answer

How to setup venv or setup to run pyspark jobs for GCP Dataproc Serverless Spark without installing packages in container image

I am working on a project where we wanted to release Serverless Spark Container Image to set of customers to use this Image to run their Serverless Spark workloads. But to run pyspark jobs in order to install the packages manually on the image(as…
1
vote
1 answer

How to enable component gateway, jupyter notebook in gcp dataproc cluster, once the cluster is created

We have the cluster created and running in gcp, we want to include the component gateway - jupyter notebook. I know that can be, if the cluster is creating for the first time. if the cluster has created, do we able to enable the component gateways…
1
vote
1 answer

Serverless spark job throwing an error while using shared VPC to connect on-prem storage

I am trying to run simple serverless spark(dataproc batch) job which reads object from on-prem ECS with shared VPC. I have open egress firewall in shared vpc to connect on-prem storage but I don't see that firewall rule is getting hit There are very…
1
vote
1 answer

how to pass custom job id via google dataproc cluster job for spark using dataproc client

i am using the following code snippet but would not found any luck. can anyone help me to pass custom job ID job = { "placement": {"cluster_name": cluster_name}, "spark_job": { "main_class": "org.example.App", …
1
vote
0 answers

Dataproc: How to implement auto scale based on presto load

I am applying presto optionally to Dataproc. I want to autoscale based on the load of the presto query, but when I set autoscaling-policies and increase the query load, it does not autoscale I read the following…
1
vote
1 answer

Reducing Dataproc Serverless CPU quota

Aim: I want to run spark jobs on Dataproc Serverless for Spark. Problem: The minimum CPU cores requirement is 12 cores for a Spark app. That doesn't fit into the default regional CPU quota we have and requires us to expand it. 12 cores is an…
1
vote
1 answer

Dataproc on GKE via Terraform not working (example provided by Terraform doc)

I was doing some test in my GCP project to verify if i can migrate Dataproc on GKE and keep it up and running, while leveraging on auto scaling for workloads. However, i'm blocked since teh beginning. Picking the example from the doc, placed…
0
votes
1 answer

does google provide techincal support for dataproc's optional components ex. Ranger?

does google provide techincal support for dataproc's optional components ex. Ranger? if yes, can someone leave a link to verify?
0
votes
0 answers

GCP Dataproc serverless - Failed to load class of driverClassName com.microsoft.sqlserver.jdbc.SQLServerDriver

I have a java spark job written using Spring boot. I will need to call MS SQL DB to get the prop/values before starting the spark operations. In my local, I was able to run the spark job by passing the driver class in spark extra classpth. I have…
0
votes
2 answers

Dataproc Workflow(ephemeral cluster) or Dataproc Serverless for batch processing?

GCP Dataproc offers both serverless (Dataproc Serverless) & ephemeral cluster (Dataproc Workflow template) for spark batch processing. If Dataproc serverless can hide infrastructure complexity, I wonder what could be the business usecase for using…
1
2