I have multiple pig jobs in a GCP workflow template with dependencies as below:
export env=dev
export REGION=us-east4
gcloud dataproc workflow-templates create test1 --region=$REGION
gcloud dataproc workflow-templates set-cluster-selector test1 \
--region=$REGION \
--cluster-labels=goog-dataproc-cluster-uuid=XXXXXXXXXXXXXXXXXXX
gcloud dataproc workflow-templates add-job pig \
--file=FILE, -f gs://dnb-p2d-d-sto-g-inbound/steps/pig_job1.sh \
--region=$REGION \
--step-id=pig_job1 \
--workflow-template=test1
gcloud dataproc workflow-templates add-job pig \
--file=FILE, -f gs://dnb-p2d-d-sto-g-inbound/steps/pig_job2.sh \
--region=$REGION \
--step-id=pig_job2 \
--start-after pig_job1 \
--workflow-template=test1
gcloud dataproc workflow-templates add-job pig \
--file=FILE, -f gs://dnb-p2d-d-sto-g-inbound/steps/pig_job3.sh \
--region=$REGION \
--step-id=pig_job3 \
--start-after pig_job2 \
--workflow-template=test1
gcloud dataproc workflow-templates instantiate test1 --region=$REGION
Is there any provision to execute GCP workflow from point of failure?
What I mean to say is that, suppose if step-id=pig_job2 fails due to some reason, is there any way that we can execute this workflow from step-id=pig_job2 only. (without creating new workflow).
I tried to use: https://stackoverflow.com/questions/71716824/gcp-workflows-easy-way-to-re-run-failed-execution
but it was not useful.
I am expecting step-id=pig_job2 to get executed directly and then remaining jobs as per dependencies.