Highest Voted 'dataproc' Questions

0

votes

1 answer

Connect PySpark session to DataProc

I'm trying to connect a PySpark session running locally to a DataProc cluster. I want to be able to work with files on gcs without downloading them. My goal is to perform ad-hoc analyses using local Spark, then switch to a larger cluster when I'm…

pyspark dataproc

asked Jan 02 '22 at 19:39

oneextrafact

159
1
9

0

votes

1 answer

How do I set up sparkmagic to work with DataProc through Livy?

I have a DataProc cluster running in GCP. I ran the Livy initialization script for it, and I can access the livy/sessions link through the gateway interface. I have the following set up for my sparkmagic config.json: { …

pyspark livy dataproc

asked Dec 30 '21 at 20:03

oneextrafact

159
1
9

0

votes

1 answer

How to view output files from Dataproc job on Google Cloud Platform

How can I view the contents of the output files from my dataproc job? Is this something I need to change in the code I've written for the dataproc .jar file? this is my storage bucket for the output of the job

hadoop google-cloud-platform google-cloud-storage dataproc

asked Nov 29 '21 at 22:02

dvb

11
2

0

votes

1 answer

How to add bigquery-connector to an existing cluster on dataproc

I've just started to use dataproc for doing machine learning on big data in bigquery.When i try to run this code : df = spark.read.format('bigquery').load('bigquery-public-data.samples.shakespeare') I get an error with some part of like this…

apache-spark dataproc

asked Nov 25 '21 at 08:12

Kerem Tatlıcı

49
5

0

votes

1 answer

Google Dataproc pySpark slow on public BigQuery table

I am trying to work with pySpark on this google public BigQuery table (Table size: 268.42 GB, Number of rows: 611,647,042). I set the region of the cluster to US (the same of the BigQuery table) but the code it's extremely slow even when using…

google-cloud-platform pyspark google-bigquery dataproc

asked Jul 02 '21 at 14:08

frebls

1

0

votes

0 answers

pyspark - how to run and schedule streaming jobs in dataproc hosted on GCP

I am trying to have a pyspark code to stream the data from delta table and perform the merge against final delta target continuously at an interval of 10 - 15 mins between each cycle. I have written a simple pyspark code and submitting the job in…

pyspark spark-streaming spark-structured-streaming dataproc

asked Feb 17 '21 at 17:52

Rak

196
2
9

0

votes

0 answers

Problems running Spark on GCP

We run a number of scripts for every release of our platform and we want to automate the run of these scripts with Snakemake. The plan is to fire up a VM on Google Cloud and run snakemake there, where the location of the input/output files are read…

hadoop google-cloud-platform pyspark snakemake dataproc

asked Feb 16 '21 at 12:27

irenels

3
2

0

votes

1 answer

gcloud dataproc clusters list filter by !=

How do I filter dataproc clusters using a != (not equal to)? I've tried: gcloud dataproc clusters list --region=us-east4 --project= --filter="labels.disposition!=permanent" ERROR: (gcloud.dataproc.clusters.list) INVALID_ARGUMENT:…

command-line-interface gcloud dataproc

asked Nov 19 '20 at 23:47

schirayu

1
2

0

votes

1 answer

dataproc create cluster gcloud equivalent command in python

How do I replicate the following gcloud command in python? gcloud beta dataproc clusters create spark-nlp-cluster \ --region global \ --metadata 'PIP_PACKAGES=google-cloud-storage spark-nlp==2.5.3' \ --worker-machine-type…

python-3.x google-cloud-platform dataproc

asked Nov 09 '20 at 20:16

Machine Learning

485
6
15

-1

votes

1 answer

what's difference between dataproc cluster on GKE vs Compute engine?

We can now create dataproc clusters using compute engine or GKE. What are the major advantages of creating a cluster on GKE vs Compute Engine. We have faced problem of insufficient resources in zone error multiple times while creating cluster on…

google-compute-engine google-kubernetes-engine google-cloud-dataproc dataproc

asked Jul 07 '22 at 09:52

Nishit patel

3
3

Questions tagged [dataproc]