6

Dataflow job is failing with below exception when I pass parameters staging,temp & output GCS bucket locations.

Java code:

final String[] used = Arrays.copyOf(args, args.length + 1); 
used[used.length - 1] = "--project=OVERWRITTEN"; final T options = 
PipelineOptionsFactory.fromArgs(used).withValidation().as(clazz); 
options.setProject(PROJECT_ID); 
options.setStagingLocation("gs://abc/staging/"); 
options.setTempLocation("gs://abc/temp"); 
options.setRunner(DataflowRunner.class); 
options.setGcpTempLocation("gs://abc");

The error:

INFO: Staging pipeline description to gs://ups-heat-dev- tmp/mniazstaging_ingest_validation/staging/
May 10, 2018 11:56:35 AM org.apache.beam.runners.dataflow.util.PackageUtil tryStagePackage
INFO: Uploading <42088 bytes, hash E7urYrjAOjwy6_5H-UoUxA> to gs://ups-heat-dev-tmp/mniazstaging_ingest_validation/staging/pipeline-E7urYrjAOjwy6_5H-UoUxA.pb
Dataflow SDK version: 2.4.0
May 10, 2018 11:56:38 AM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: Printed job specification to gs://ups-heat-dev-tmp/mniazstaging_ingest_validation/templates/DataValidationPipeline
May 10, 2018 11:56:40 AM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: Template successfully created.
Exception in thread "main" java.lang.NullPointerException
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.getJobWithRetries(DataflowPipelineJob.java:501)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.getStateWithRetries(DataflowPipelineJob.java:477)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:312)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:248)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:202)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:195)
    at com.example.DataValidationPipeline.main(DataValidationPipeline.java:66)
Saeed Mohtasham
  • 1,693
  • 16
  • 27
Mohammed Niaz
  • 386
  • 1
  • 5
  • 17
  • Would you mind updating this with the full command line command you used? – Alex Amato May 11 '18 at 22:56
  • Running from Eclipse and setting arguments in code. – Mohammed Niaz May 14 '18 at 06:24
  • final String[] used = Arrays.copyOf(args, args.length + 1); used[used.length - 1] = "--project=OVERWRITTEN"; final T options = PipelineOptionsFactory.fromArgs(used).withValidation().as(clazz); options.setProject(PROJECT_ID); options.setStagingLocation("gs://abc/staging/"); options.setTempLocation("gs://abc/temp"); options.setRunner(DataflowRunner.class); options.setGcpTempLocation("gs://abc"); – Mohammed Niaz May 14 '18 at 06:27
  • Hi, would you mind providing a bit more context. Can you provide the full code and pom.xml files to see which versions of dependencies you are using. – Alex Amato May 14 '18 at 22:46
  • @MohammedNiaz - Hello, is the issue resolved? If yes, can you share the solution? – MonicaPC Jun 20 '18 at 23:57
  • It seems that Dataflow Monitoring Interface error messages are somehow generic. To get more in depth details of the errors, you can use the Stackdriver logs (by choosing the Stackdriver sign listed in the up-right corner of the selected job logs in the Google Cloud Console - Dataflow page) or as stated by the Stackdriver documentation page https://cloud.google.com/logging/docs/view/overview . – MonicaPC Jun 29 '18 at 20:31

3 Answers3

5

I was also facing the same issue, the error was throwing at p.run().waitForFinish();. Then I have tried following code

   PipelineResult result = p.run();
   System.out.println(result.getState().hasReplacementJob());
   result.waitUntilFinish();

This was throwing the following exception

    java.lang.UnsupportedOperationException: The result of template creation should not be used.
    at org.apache.beam.runners.dataflow.util.DataflowTemplateJob.getState (DataflowTemplateJob.java:67)

Then to fix the issue I used the following code

    PipelineResult result = pipeline.run();
    try {
        result.getState();
        result.waitUntilFinish();
    } catch (UnsupportedOperationException e) {
       // do nothing
    } catch (Exception e) {
        e.printStackTrace();
    }
SANN3
  • 9,459
  • 6
  • 61
  • 97
  • This resolved my Problem. Thank you. But I am asking myself if silently ignoring an Error could be potentially dangerous? – sg_rs Oct 05 '22 at 15:29
1

as displayed in the official Flex Template sample

There is a comment saying: // For a Dataflow Flex Template, do NOT waitUntilFinish().

The same applies if you call any of those methods of the Runner if you pass the --templateRunner argument

if you change the pipeline to pipeline.run(); it is not going to fail.

The Issue is still flagged as opened by apache beam https://github.com/apache/beam/issues/20106

Tonino
  • 1,137
  • 10
  • 25
0

I was running into the problem with java.lang.UnsupportedOperationException: The result of template creation should not be used. today aswell and I tried to fixed it by checking if the job was of type DataflowTemplateJob first:

  val (sc, args) = ContextAndArgs(cmdlineArgs)
  // ...
  val result = sc.run()
  if (!result.isInstanceOf[DataflowTemplateJob]) result.waitUntilFinish()

I think this should work for bare java jobs, but if you use Scio, then the result will be some anonymous type, so in the end I had to do the try catch version aswell.

    try {
      val result = sc.run().waitUntilFinish()
    } catch {
      case _: UnsupportedOperationException  => // this happens during template creation
    }
Hechamon
  • 33
  • 1
  • 6