Questions tagged [dataproc]

130 questions
1
vote
0 answers

Dataproc exit code 247

running sparkNLP on DataProc and the code ends abruptly and the only log statement is "Job 'd74196c5-c5e9-4629-a317-20a0f151abf7' completed with exit code 247" What does exit code 247 mean?
Machine Learning
  • 485
  • 6
  • 15
1
vote
1 answer

Do I need to restart nodes if i am running spark on yarn after changing spark-env.sh or spark-defaults?

I am working on changing conf for spark in order to limit the logs for my spark structured streaming log files. I have figured the properties to do so, but it is not working right now. Do i need to restart all nodes (name and worker nodes) or is…
1
vote
1 answer

Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist

I am running a spark job in Google dataproc cluster version 1.4 and spark version 2.4.5 which reads a file with regular expression in the path from GS bucket and getting below error. Exception in thread "main" org.apache.spark.sql.AnalysisException:…
1
vote
0 answers

Plugin problem on Zeppelin using Dataproc

I am working with Dataproc and trying to save my notebooks on GCS and GitHub using the correct variables. But it is not working. I am using Zeppelin component. I got this error: INFO [2020-07-23 19:54:59,790] ({qtp684874119-16}…
0
votes
1 answer

PyFlink job encountering "No module named 'google'" error when using FlinkKafkaConsumer

I'm working on a PyFlink job that reads data from a Kafka topic using the FlinkKafkaConsumer connector. However, I'm encountering a persistent issue related to the google module when trying to run the job on my Flink cluster. The job works fine…
0
votes
0 answers

Dataproc Serverless for Spark Batch runs into a timeout after 4 hours

How can I increase the time limit? Logs show "Found 1 invalidated leases" and "Task srvls-batch-d702cb8b-1d45-44d2-bf2e-6bf6275f66bf lease grant was revoked, cancelling work." Tried researching the properties but couldn't find any for time limit.
0
votes
1 answer

Using Spark Dataproc to write in Pub/Sub

I want to use Dataproc Spark to run 2 SQL files on BigQuery that are executed each minute, and then write the results into pub/sub. I am not sure if I can use these two technologies together. Does anyone that already have used Dataproc with Pub/Sub…
0
votes
0 answers

PySpark Job on Dataproc Throws IOException but Still Completes Successfully

I'm running a PySpark job on Google Cloud Dataproc, which uses structured streaming with a trigger of 'once'. The job reads Parquet data from a raw layer (a GCS bucket), applies certain business rules, and then writes the data in Delta format to a…
0
votes
1 answer

does google provide techincal support for dataproc's optional components ex. Ranger?

does google provide techincal support for dataproc's optional components ex. Ranger? if yes, can someone leave a link to verify?
0
votes
0 answers

How to pass a variable between to steps in DataProc's workflow?

I need to get the result of calculation as variable of previous step in DataProc workflow. (Something like xcom_pull in Airflow). For example: My Cloud Function initiates a workflow which contains several consecutive steps. First step receives a…
stefel
  • 1
  • 1
0
votes
2 answers

In Dataproc, whether or not the file prefix should be used when applying a property to job?

Actually the document explicitly states: When applying a property to a job, the file prefix is not used. However, the example given there is inconsistent with this This is what the page says: ...However, many of these properties can also be…
figs_and_nuts
  • 4,870
  • 2
  • 31
  • 56
0
votes
0 answers

Pyspark Stop Session stuck

We have Pyspark job running in Google Dataproc cluster. Intermittently we are observing the job is stuck when we run SparkSession.stop() code. we enabled debug logs and below is the logs that we observe. Even after our code task is completed, the…
0
votes
0 answers

NodeInitializationAction has no "executableFile" field

I'm using google.cloud dataprocv1 and trying to create a dataproc cluster. I have to clone a github repo into the cluster, so using Initialization actions, I'm trying to run a bash script which is in Storage bucket of GCP. I'm seeing this error when…
PBL
  • 1
0
votes
2 answers

gcloud dataproc clusters list --region=us-central1 shows list of 0

We have two clusters in that region however when I runt his command in cli, it returns list of 0 Even when I describe the cluster I do not see "region" but the zone is us-central1-a. Ultimately I am trying to stop the cluster but since region is not…
0
votes
1 answer

(GKE Dataproc) Global region is not supported for Dataproc Virtual Cluster

I am trying to create Dataproc cluster on GKE. I was following steps on the GCP official website. After running the following command on gcloud CLI, DP_CLUSTER="test-gke" \ REGION="asia-east2-a" \ GKE_CLUSTER="airflow-cluster" \ …
1 2 3
8 9