Questions tagged [google-cloud-dataproc-serverless]
25 questions
1
vote
1 answer
How to setup venv or setup to run pyspark jobs for GCP Dataproc Serverless Spark without installing packages in container image
I am working on a project where we wanted to release Serverless Spark Container Image to set of customers to use this Image to run their Serverless Spark workloads.
But to run pyspark jobs in order to install the packages manually on the image(as…

ash_ketchum12
- 73
- 6
1
vote
1 answer
How to enable component gateway, jupyter notebook in gcp dataproc cluster, once the cluster is created
We have the cluster created and running in gcp, we want to include the component gateway - jupyter notebook. I know that can be, if the cluster is creating for the first time. if the cluster has created, do we able to enable the component gateways…

Mani Shankar.S
- 39
- 6
1
vote
1 answer
Serverless spark job throwing an error while using shared VPC to connect on-prem storage
I am trying to run simple serverless spark(dataproc batch) job which reads object from on-prem ECS with shared VPC. I have open egress firewall in shared vpc to connect on-prem storage but I don't see that firewall rule is getting hit
There are very…

Jaysukh Kalasariya
- 73
- 1
- 7
1
vote
1 answer
how to pass custom job id via google dataproc cluster job for spark using dataproc client
i am using the following code snippet but would not found any luck.
can anyone help me to pass custom job ID
job = {
"placement": {"cluster_name": cluster_name},
"spark_job": {
"main_class": "org.example.App",
…

Faisal Khan
- 85
- 5
1
vote
0 answers
Dataproc: How to implement auto scale based on presto load
I am applying presto optionally to Dataproc.
I want to autoscale based on the load of the presto query, but when I set autoscaling-policies and increase the query load, it does not autoscale
I read the following…

yumetukushi
- 11
- 1
1
vote
1 answer
Reducing Dataproc Serverless CPU quota
Aim: I want to run spark jobs on Dataproc Serverless for Spark.
Problem: The minimum CPU cores requirement is 12 cores for a Spark app. That doesn't fit into the default regional CPU quota we have and requires us to expand it. 12 cores is an…

doeJohn
- 11
- 2
1
vote
1 answer
Dataproc on GKE via Terraform not working (example provided by Terraform doc)
I was doing some test in my GCP project to verify if i can migrate Dataproc on GKE and keep it up and running, while leveraging on auto scaling for workloads.
However, i'm blocked since teh beginning.
Picking the example from the doc, placed…

Marco Massetti
- 539
- 4
- 12
0
votes
1 answer
does google provide techincal support for dataproc's optional components ex. Ranger?
does google provide techincal support for dataproc's optional components ex. Ranger?
if yes, can someone leave a link to verify?

ciao_bella
- 31
- 1
- 6
0
votes
0 answers
GCP Dataproc serverless - Failed to load class of driverClassName com.microsoft.sqlserver.jdbc.SQLServerDriver
I have a java spark job written using Spring boot. I will need to call MS SQL DB to get the prop/values before starting the spark operations. In my local, I was able to run the spark job by passing the driver class in spark extra classpth.
I have…

Baskar Gopal
- 1
- 1
0
votes
2 answers
Dataproc Workflow(ephemeral cluster) or Dataproc Serverless for batch processing?
GCP Dataproc offers both serverless (Dataproc Serverless) & ephemeral cluster (Dataproc Workflow template) for spark batch processing.
If Dataproc serverless can hide infrastructure complexity, I wonder what could be the business usecase for using…

Rashmit Rathod
- 753
- 7
- 11