Questions tagged [google-cloud-dataproc]

Google Cloud Dataproc is a managed Hadoop MapReduce, Spark, Pig and Hive service on Google Cloud Platform. The service provides GUI, CLI and HTTP API access modes for deploying/managing clusters and submitting jobs onto clusters.

Google Cloud Dataproc is a managed Hadoop MapReduce, Spark, Pig and Hive service on Google Cloud Platform. The service provides GUI, CLI and HTTP API access modes for deploying/managing clusters and submitting jobs onto clusters. This tag can be added to any questions related to using/troubleshooting Google Cloud Dataproc.

Useful links:

1563 questions
0
votes
1 answer

Google DataProc Spark - getting "permission denied (publickey)" error when trying to SSH to a worker node

small cluster. 1 master, 2 workers. I can access all nodes (master+slave) just fine using gcloud SDK. However, once I access the master node and try to ssh to a slave node, I get "permission denied (publickey)" error. Note that I can ping the node…
sermolin
  • 161
  • 1
  • 2
  • 6
0
votes
1 answer

Has the Google Cloud Dataproc preview image's Spark version changed?

I recently started a Spark cluster on Google Cloud Dataproc using the 'preview' image. According to the documentation, the preview image's Spark version is '2.1.0', however running spark-shell --version reveals that the cluster is in fact running…
mjaz
  • 471
  • 3
  • 9
0
votes
1 answer

creating a simple html using Dataproc API

I am new with Google API and I'm trying to connect my web site which is in another google cloud server, running on Django to my Google DataProc Cluster. All but errors until now. Here is my code:
0
votes
1 answer

Spark on Google Cloud Dataproc job failures on last stages

I work with Spark cluster on Dataproc and my job fails in the end of processing. My datasource is text logs files in csv format on Google Cloud Storage (total volume is 3.5TB, 5000 files). The processing logic is following: read files to DataFrame…
0
votes
1 answer

How to use params/properties flag values when executing hive job on google dataproc

I am trying to execute a hive job in google dataproc using following gcloud command: gcloud dataproc jobs submit hive --cluster=msm-test-cluster --file hive.sql --properties=[bucket1=abcd] gcloud dataproc jobs submit hive --cluster=msm-test-cluster…
abhishek jha
  • 1,065
  • 4
  • 21
  • 41
0
votes
1 answer

dataproc cluster on google cloud

My understanding is that running a Dataproc cluster instead of setting up your own compute engine cluster is that it takes care of installing the storage connector (and other connectors). What else does it do for you?
Alex
  • 19,533
  • 37
  • 126
  • 195
0
votes
1 answer

Minimal access requirement for Dataproc initialization scripts

I have a bucket with initialization actions, that has the following ACL: deployment_service_user: Owner dataproc_service_user: Reader Objects in the bucket have the same ACL. While all users involved into launching that cluster should have the…
chemikadze
  • 815
  • 4
  • 12
0
votes
1 answer

Failed to stop job or delete job in dataproc google cloud platform

When I am trying to delete dataproc cluster in google cloud platform getting below error, Failed to stop job b021d29d-acc9-409d-8fca-52363076a63c Cluster not found could any one help??
0
votes
2 answers

Google Dataproc Hive instance through a third party tool

Need your help here. I hope to connect my Google Dataproc Hadoop/Hive instance with a third party tool and started with "Toad for Hadoop". Is it a good choice ? Or is there any other tool i could use ?
0
votes
1 answer

configuration for `spark.hadoop.fs.s3` gets applied to `fs.s3a` not `fs.s3`

I have read the answer posted on how to configure s3 access keys for dataproc but I have find it dissatisfactory. Reason is because when I follow the steps and set hadoop conf for spark.hadoop.fs.s3, s3://... path still have access issue while as…
jk-kim
  • 1,136
  • 3
  • 12
  • 20
0
votes
1 answer

Spark UI available on Dataproc Cluster?

Looking to interact with the traditional Spark Web GUI on default clusters in Dataproc.
deepelement
  • 2,457
  • 1
  • 25
  • 25
0
votes
1 answer

NoSuchMethodError when trying to run Gobblin on Dataproc

I'm trying to run Gobblin on Google Dataproc but I'm getting this NoSuchMethodError and can't figure out how to solve. Waiting for job output... ... Exception in thread "main" java.lang.reflect.InvocationTargetException at…
Henrique G. Abreu
  • 17,406
  • 3
  • 56
  • 65
0
votes
1 answer

Talend connector to Google Cloud Dataproc

is it possible to connect Talend to Google Cloud Dataproc? And are there any connectors available for it? On 1 it says it does but can't find any documentation related to it. If the above is true, I would also like to know if it's is possible to run…
rish0097
  • 1,024
  • 2
  • 18
  • 39
0
votes
2 answers

Give custom job_id to Google Dataproc cluster for running pig/hive/spark jobs

Is there any flag available to give custom job_id to dataproc jobs. I am using this command to run pig jobs. gcloud dataproc jobs submit pig --cluster my_cluster --file my_queries.pig I use similar commands to submit pyspark/hive jobs. This…
abhishek jha
  • 1,065
  • 4
  • 21
  • 41
0
votes
2 answers

MapReduce job container killed by Google Cloud Platform [Error code:143]

I tried to run a mapreduce job on a cluster in Google Cloud Platform using Python package mrjob as follows: python mr_script.py -r dataproc --cluster-id [CLUSTER-ID] [gs://DATAFILE_FOLDER] I can successfully run the very same script against the…
1 2 3
99
100