Highest Voted 'dataproc' Questions

0

votes

0 answers

Physical mem used % and Physical Vcores Used % in spark 3 yarn

I like to understand,what is "Physical mem used % and Physical Vcores Used %" in spark 3 yarn. I don't see these metric in spark 2.4 and i could see these new metric in spark 3 yarn. What is Physical mem used % ? What is Physical Vcores Used %? Even…

apache-spark hadoop-yarn dataproc

asked Feb 17 '23 at 03:50

Mac

1

0

votes

1 answer

dataproc hadoop/spark job can not connect to cloudSQL via Private IP

I am facing this issue of setting up private ip access between dataproc and cloud sql with vpc network and peering setup, would really appreciate help since not able to figure this our since last 2 days of debugging, after following pretty much all…

google-cloud-platform google-cloud-sql google-vpc dataproc

asked Feb 07 '23 at 23:33

Jay99

81
2
8

0

votes

1 answer

GCP Serverless pyspark : Illegal character in path at index

I'm trying to run a simple hello world python code on Serverless pyspark on GCP using gcloud (from local windows machine). if __name__ == '__main__': print("Hello") This always results in the error =========== Cloud Dataproc Agent Error…

google-cloud-platform pyspark gcloud dataproc

asked Jan 25 '23 at 14:59

Pankaj

2,220
1
19
31

0

votes

0 answers

How to get dask job logs on GCP logs explorer?

I'm using dataproc to run my dask job written in python, I can catch every logs except those from distributed computing (lazy). I'm using google cloud logging. The way i catch logs : logging.getLogger(_name_).exception(e,…

python google-cloud-platform dask google-cloud-logging dataproc

asked Jan 16 '23 at 15:32

Jahd Jabre

45
2
6

0

votes

1 answer

Dataproc pyspark job total bytes billed

I have a pyspark job that I submitted via dataproc. I would like to know how much data did my job use, or in other words, how much is GCP going to bill me. I looked at the information schema table, those dont show the jobs run via dataproc. I am…

google-cloud-platform pyspark google-bigquery dataproc

asked Jan 16 '23 at 01:12

SomeRandomUser

1
1

0

votes

0 answers

How to run SHOW PARTITIONS on hive table using pyspark?

I am trying to run SHOW PARTITIONS on hive table using pyspark, but it is failing with the below error. I am using dataproc cluster on GCP to run pyspark job. ivysettings.xml file not found in HIVE_HOME or…

google-cloud-platform pyspark hive dataproc

asked Jan 15 '23 at 15:38

majain

31
5

0

votes

0 answers

Pyspark jobs on dataproc using documents from Firestore

I need to run some simple Pyspark jobs on Big Data that are stored in Google's Firestore. The dataset contains 42 million documents regarding Instagram posts. I want to do some simple aggregations like summing the number of likes per country…

apache-spark google-cloud-firestore pyspark google-cloud-dataproc dataproc

asked Jan 12 '23 at 04:21

Andreas Rousos

11

0

votes

0 answers

GCP Dataproc : Scala SSH Tunnel Oracle database

I'm trying to run the Spark job from GCP Dataproc. It's mainly reading data from AWS Oracle databases, so it connects to them via SSH. It's working fine locally, but not in the GCP Dataproc cluster. import com.jcraft.jsch.Session import…

google-cloud-platform ssh jsch tunnel dataproc

asked Dec 25 '22 at 16:16

Jayachandran Nachimuthu

3
1
4

0

votes

0 answers

Dataproc PHS Yarn RM UI not able to read logs from remote-app-log-dir

I am working on setting up a dataproc PHS for my Spark and Hive applications. I was successfully able to set up the Spark History Server in a standalone dataproc cluster (PHS) by setting up the following…

apache-spark hadoop-yarn resourcemanager dataproc spark-ui

asked Dec 14 '22 at 06:51

Vanshaj Bhatia

77
8

0

votes

0 answers

configuration of yml file for cloud workflow

i want to write on yml file for create a workflow for scheduler et progrmmer my dataflow et dataproc serveless job, can you help me? i try # This is a sample cloud workflow YAML file for scheduling Dataflow jobs. name: Scheduled Dataflow Job #…

yaml cloud workflow google-cloud-dataflow dataproc

asked Dec 08 '22 at 18:27

sahar

1

0

votes

0 answers

Setup local docker Image for google Dataproc service

I am having trouble setting up the google docker image for Dataproc service. I tried steps on below stackoverflow https://stackoverflow.com/questions/69555415/gcp-dataproc-base-docker-image/74715158#74715158 but getting an error as below PS…

docker google-cloud-platform containers docker-image dataproc

asked Dec 07 '22 at 10:36

Piyush Namra

11
4

0

votes

0 answers

How to choose preview-debian11 (dataproc-release-2.1) image to create dataproc cluster

How can I get Debian 11(dataproc-release-2.1)image to create dataproc cluster? I found that dataproc provide preview-debian11 version as the link below. https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-version-clusters I need…

apache-spark dataproc

asked Nov 25 '22 at 12:06

Pongthorn Sa

35
3

0

votes

0 answers

Sqoop AVRO Dependency Issue in GCP dataproc 2.0.49 Image

I am facing Jar dependency issue while connecting Oracle dabase using Sqoop. i am able to connect to database and not able to get the data from Oracle in Avro format. Error msg as: [2022-11-22 05:43:40,031] {subprocess.py:92} INFO - Exception in…

google-cloud-platform jar avro sqoop dataproc

asked Nov 23 '22 at 12:07

Siri Vali

1
1

0

votes

1 answer

How to use new Spark Context

I am currently running a jupyter notebook on GCP dataproc and hoping to increase the memory available via my config: I first stopped my spark context: import pyspark sc = spark.sparkContext sc.stop() Waited until running the next code block so…

python apache-spark google-cloud-platform pyspark dataproc

asked Nov 17 '22 at 23:39

Curl

105
2

0

votes

2 answers

How to configure an alerting policy for failed Dataproc Batch?

I want to alert on failure of any serverless dataproc job. I think that I may need to create a log based metric and then an alerting policy based on that metric. I tried creating an alerting policy with the filter below: filter =…

google-cloud-platform terraform dataproc gcp-alerts

asked Nov 14 '22 at 19:47

Daniel Fletemier

31
3

Questions tagged [dataproc]