1

I was able to run the non-templated beam job directly on the GCP dataflow runner by using the below command :

java -jar <jar_name> 
--runner=DataFlowRunner 
--gcpTempLocation=gs://some/gcs/location 
--stagingLocation=gs://some/gcs/location/stage 
--tempLocation=gs://some/gcs/location/temp 
--region=<region_name> 
--project=<project_name> 
--subnetwork=<subnet_name> 
--jobName=<job_name>

I wanted to templatize the same job by using the below command to stage the template in the gcs bucket :

java -jar <jar_name> 
--runner=DataFlowRunner 
--gcpTempLocation=gs://some/gcs/location 
--stagingLocation=gs://some/gcs/location/stage 
--templateLocation=gs://some/gcs/location/templates/<job_name>
--region=<region_name> 
--project=<project_name>

but I am receiving the below error while creating the job template instance:

18:11:05.004 [main] INFO org.apache.beam.runners.dataflow.DataflowRunner - Template successfully created.
Exception in thread "main" java.lang.UnsupportedOperationException: The result of template creation should not be used.
    at org.apache.beam.runners.dataflow.util.DataflowTemplateJob.getJobId(DataflowTemplateJob.java:41)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.getJobWithRetries(DataflowPipelineJob.java:559)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.getStateWithRetries(DataflowPipelineJob.java:540)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:324)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:253)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:212)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:206)
    at com.gojek.de.jobs.EventFilterJob.main(EventFilterJob.java:72)

and upon running the dataflow job from the GCS template, the dataflow job runner cannot launch the job instance from the template.

I am able to see the template creation at the GCS bucket. I am not sure, why the job run failed. also, can we directly convert a non-template beam job to a template job?

Note: I cannot run the maven command given in the document as our project is Gradle based.

Siddhanta Rath
  • 976
  • 3
  • 21
  • 37
  • Hi @Siddhanta Rath, Could you clarify how you are running the commands in gradle and is there any specific documentation you are following for running the templated jobs in GCP Dataflow using gradle? – Shipra Sarkar Sep 19 '22 at 13:12

1 Answers1

0

When you are creating a template, you can not use DataflowPipelineJob::waitUntilFinish, as there is no job attached to that run -- which seems to be the case here.

Take a look at the WordCount example to see an example of a working template.

Bruno Volpato
  • 1,382
  • 10
  • 18