Getting DataProc output in GCP Logging

Question

I have a DataProc job that outputs some logs during the execution. I can see those logs in the Job output.

My cluster is created according to the documentation with such parameters:

dataproc:jobs.file-backed-output.enable=true
dataproc:dataproc.logging.stackdriver.enable=true
dataproc:dataproc.logging.stackdriver.job.driver.enable=true
dataproc:dataproc.logging.stackdriver.job.yarn.container.enable=true

I can see all system logs in Logging, but not the output from my job. The maximum I found is the URL to the rolling output file (even not a concrete file).

Is there any chance I can forward job output to Logging?

As per documentation cluster can be created with spark:spark.submit.deployMode=cluster so the output will be logged into yarn user logs group. But whenever I do that my job is failing with:

21/03/15 16:20:16 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: Uncaught exception:
java.lang.IllegalStateException: User did not initialize spark context!

score 1 · Answer 1 · answered Aug 07 '21 at 00:11

I was able to create a cluster and submit jobs as follows. I went to StackDriver and refreshed my page. After refreshing, I could see Cloud Dataproc Job logging filter.

Also I noticed that both the jobs I ran, the job output was logged as 'Any Log Level'. Not sure if you are using any log level filtering.

Are you able to see Cloud Dataproc Job in logging filter after passing dataproc:dataproc.logging.stackdriver.job.driver.enable=true and refreshing the page?

Are you using one of the supported image versions?

Repro steps: Cluster Creation:

gcloud dataproc clusters create log-exp --region=us-central1 \
--properties 'dataproc:dataproc.logging.stackdriver.job.driver.enable=true'

Job Submisson: PySpark

gcloud dataproc jobs submit pyspark \
    gs://dataproc-examples/pyspark/hello-world/hello-world.py \
    --cluster=log-exp  \
    --region=us-central1

Job Submission: Spark

gcloud dataproc jobs submit spark \
    --cluster=log-exp \
    --region=us-central1 \
    --class=org.apache.spark.examples.SparkPi \
    --jars=file:///usr/lib/spark/examples/jars/spark-examples.jar \
    -- 100

Cloud Dataproc Job filter Loggin level

Getting DataProc output in GCP Logging

1 Answers1