10

I'm trying to launch a Dataflow job on GCP using Apache Beam 0.6.0. I am compiling an uber jar using the shade plugin because I cannot launch the job using "mvn:execjava". I'm including this dependency:

<dependency>
  <groupId>org.apache.beam</groupId>
  <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
  <version>0.6.0-SNAPSHOT</version>
</dependency>

I am getting the following exception:

Exception in thread "main" java.lang.IllegalArgumentException: Unknown 'runner' specified 'DataflowRunner', supported pipeline runners [DirectRunner]
    at org.apache.beam.sdk.options.PipelineOptionsFactory.parseObjects(PipelineOptionsFactory.java:1609)
    at org.apache.beam.sdk.options.PipelineOptionsFactory.access$400(PipelineOptionsFactory.java:104)
    at org.apache.beam.sdk.options.PipelineOptionsFactory$Builder.as(PipelineOptionsFactory.java:289)
    at com.disney.dtss.desa.tools.SpannerSinkTest.main(SpannerSinkTest.java:116)
Caused by: java.lang.ClassNotFoundException: DataflowRunner
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:264)
    at org.apache.beam.sdk.options.PipelineOptionsFactory.parseObjects(PipelineOptionsFactory.java:1595)

Am I missing something else?

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Guy Molinari
  • 101
  • 1
  • 3
  • That is definitely the expected output if the DataflowRunner is not registered. Can you share anything more about your pom.xml, your mvn invocation, or perhaps a listing of the contents of your uber jar and how you invoke it? – Kenn Knowles Mar 21 '17 at 20:42
  • I'm having the same issue. It works fine when I start the pipeline though `mvn compile exec:java`, when I build jar it fails. The uberjar contains the necessary classes. – Paweł Szczur Apr 25 '17 at 12:55

2 Answers2

10

try

mvn compile exec:java -Dexec.mainClass=Yourmain Class -Pdataflow-runner

*add -Pdataflow-runner at the last

ntsd
  • 101
  • 2
  • 7
  • 2
    In `pom.xml`, if the dependency is defined as part of a profile, make sure to specify the profile for the `mvn` command. The default WordCount example from Apache Beam does this for the `DataflowRunner`. If you don't care about profiles, just move the dependency definition to the `` section of the pom file. – Andrew Nguonly Feb 16 '18 at 23:16
3

Following @Andrew Nguonly's comment I copied the dependency for DataflowRunner to outer scope (to the <dependencies> tag) in the pom.xml file.

Basically added this:

<dependency>
  <groupId>org.apache.beam</groupId>
  <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
  <version>${beam.version}</version>
  <scope>runtime</scope>
</dependency>

Before the closing </dependencies> at pom.xml from the beam wordCount example.

jurl
  • 2,504
  • 1
  • 17
  • 20
  • 1
    For VSCode users, the above method might be the best bet as there isn't yet a clean way to switch profiles: https://github.com/microsoft/vscode-maven/issues/465 – nomadic_squirrel Jul 13 '21 at 22:06